From the backlogs of history books to the latest Yak, the world is full of data, and for those who can read the patterns, possibilities for analyzing human behavior abound.
UC Berkeley is piloting a class this fall that faculty say will teach students how to engage with this information in a digitized world, where data are increasingly ubiquitous.
The new four-unit course, “Foundations of Data Science” — cross-listed as Statistics 94 and Computer Science 94 — combines introductory statistics and computational concepts with hands-on work involving hard data that brings “real-world relevance,” according to the program’s website. The course is a part of the new Data Science Education Program, a project that was initiated last year in response to strong student interest in learning programming and statistics.
“We are surrounded by vast amounts of data — you cannot escape it,” said Ani Adhikari, the principal instructor for the course, in her first lecture to the class. “Your iTunes library, your photo collections, your Facebook friends themselves form a data set.”
During that class period, Adhikari demonstrated what one could learn about a story such as “Huckleberry Finn” purely by parsing through the text and analyzing demographic and statistical information.
David Culler, a professor in the campus’s department of electrical engineering and computer sciences, said in an email that over the past seven years, annual undergraduate enrollment in introductory computer science courses has grown from less than 10 percent of the freshman class to more than 50 percent. The rapid crowding of these and statistics courses has left these departments tasked with accommodating skyrocketing demand.
There has been massive growth in job opportunities in data-science-related areas — including in academic research areas — and a shortage of people prepared to fill them, according to Culler.
“(In data sets) there may be some very illuminating insights about human behavior,” Adhikari said. “Those data need to be looked at not just by mathematicians … but people who bring varied perspectives, people who will ask good questions, and that’s very important.”
Cathryn Carson, a campus associate history professor who helped plan the course curriculum, said a key aspect of the curriculum is the integration between the main course and a series of “connector courses,” which are two-unit area-specific courses in subjects such as history, law and health. They are intended by instructors to give students an opportunity to apply the skills in the context of the students’ interests.
“It really is geared to people who have not taken any previous stats classes or programming or anything,” said UC Berkeley sophomore Priyanka Bhoj, who is taking the course.
Adhikari thinks the course may take some pressure off classes on programming and statistics, which are chronically overbooked.
Bob Jacobsen, campus interim dean of undergraduate studies, said faculty will use feedback from the first semester to develop potential plans for data science major or minor curricula.
The class currently has approximately 110 students, but Jacobsen confirmed that the campus plans to increase enrollment in subsequent semesters based on student demand.