Campus artificial intelligence research group creates data set for self-driving cars

Berkeley Deep Dive/Courtesy

Related Posts

The Berkeley Artificial Intelligence Research Lab, or BAIR, released a study on BDD100K — a driving database that can be used to train algorithms of self-driving cars — May 12.

The data set can be used to train self-driving cars’ artificial intelligence programs, according to BAIR’s website. The study concluded that the data set can help researchers understand how different scenarios affect current self-driving car programs.

A study by the research team that created the data set described two contributions to self-driving cars, one of which is the data set and the other its video annotation system. According to BAIR’s website, BDD100K is “the largest and most diverse driving video dataset,” containing 100,000 driving clips. Each clip in the data set is about 40 seconds long for various scenarios and driving conditions in different locations.

“It is great to have more open data out there for real-world domain problems, in this case for the autonomous driving space,” Cathy Wu, a doctoral student in UC Berkeley’s department of electrical engineering and computer sciences, said in an email. “This will allow the research community to really see if their methods really generalize or apply in real-world domains, and allows for the development of new methods.”

Other than the 100,000 videos, BDD100K includes multiple cities, weathers, times of day and scene types, according to BAIR’s website. The videos can be useful for imitation learning, or artificial intelligence learning through mimicking human behavior. Locations videoed range from New York to the Bay Area.

To help understand the diversity and number of objects, BAIR placed two-dimensional bounding boxes after the first 10 seconds of each video. Those boxes were used to identify buses, traffic lights, traffic signs, people, bikes, other vehicles and riders. Most notably, the 100,000 videos included over 1 million cars, according to the website.

“This dataset is much larger and diverse than other available academic datasets for autonomous driving — which is great,” Wu said. “This is also a dataset that is grounded in a truly real-world problem, rather than a dataset collected from images on the internet.”

According to the study, the researchers also created a system to allow objects to be identified with more accuracy and efficiency. This system is used to find the diversity in objects and object detection by adding curved boundaries.

The study states that the data set comes with comprehensive annotations, and can serve as a “benchmark” for other research initiatives. Wu added that many researchers can benefit from this data set, as it allows researchers to test their techniques for self-driving cars.

“This dataset may also provide a means of comparing machine learning techniques in a standardized manner,” Wu said in the email. “The subsequent research results may in turn shape the development of autonomous driving in industry as well.”

Contact Suryan Bhatia and Yao Huang at [email protected].