Four years ago, about 110 UC Berkeley students enrolled in the first edition of a class called Computer Science 94. Today that class is better known as Data 8, one of the largest and fastest-growing classes on campus. Serving as the introductory class to UC Berkeley’s new data science major, Data 8 has become symbolic of the growing interest in the field of data science, which combines computer science principles with statistical analysis. The major, offered through the Division of Data Sciences, is on track to become one of the most popular majors on campus.
Today, we engage in conversation with one of the architects of the program — professor Ani Adhikari. Currently the chair of the Undergraduate Program Committee in the department of statistics, Adhikari has been at the forefront of both designing and popularizing the data science division and major. The passion and exuberance with which she approaches her teaching do not go unnoticed by the student body, which voted her “best professor” in The Daily Californian’s 2019 edition of Best of Berkeley. In our conversation, we discussed her academic background in the field and insights she has for prospective students, as well as some parting thoughts for upcoming data science graduates — the second class to graduate with such a degree from UC Berkeley.
Adhikari spent her undergraduate years at the Indian Statistical Institute in Kolkata, taking classes across mathematics, statistics and probability. She then continued her education at UC Berkeley, completing a doctorate in probability theory.
The Daily Californian: Berkeley has several renowned departments, including CS (computer science), statistics and economics. In that regard, where do you see data science fitting in/forming its own unique sort of discipline? Do you see it as an overlap?
Ani Adhikari: You know, I don’t actually think it is unique. I mean, statisticians have been doing data science for a long time. What has changed now is that tools that we now have made it possible for us to answer and also ask questions which we could not have conceived of asking before. And so, it is now a collaboration between the statistics department and computer science for developing the fundamental theory, and collaboration with every other domain for developing a theory that actually makes sense for domain applications. Now, the latter has always been true for statistics — that you’re not just doing mathematics for its own sake, but that it has to be meaningful to somebody else. I think it’s a much bigger change of mindset for people in computer science as opposed to people in statistics.
All this that is (considered) very new (is) not new for us at all. It’s that computer science, maybe even very recently, until about 10 years ago, was primarily used for building things such as software. It’s an engineering department. But increasingly, computer science is being used for analysis. So I think for computer scientists, seeing this new direction for their field is actually genuinely new. For us, for statisticians, it seems like what we’ve always been doing, but on a much bigger scale, which actually informs what we can conceive of today.
DC: So I’m a freshman myself, and this next one is a question that a lot of incoming freshmen — including myself — this last year had. How does the data science major differ from a double major in CS and statistics?
AA: You know, statistics majors have, for decades, been joint majors. It used to be that econ/stats was the most common combination, but increasingly it has become CS and stats. For us, it’s just not new — this idea that students will learn inference and they will learn computing. It has been happening for quite some time. So the data science major is one where you are formalizing some things for students who are interested in a broader experience than they could have with CS and stats. The CS-and-stats double major is often somebody who is exceptionally qualified at developing the field — not just running programs, but understanding the underpinnings of data science. But there is an increasing number of people who use data science and who bring good questions to data science and who have a very perceptive sense of what is the right thing to do and how to interpret it — who do not necessarily need to develop all the theoretical underpinnings but whose insight develops the field. That person might want to be a data science major.
A properly designed CS-and-stats major not only gives you theoretical underpinnings in both disciplines, but because stats has asked for an applied cluster, it also gives you some sense of where you are going to apply these insights. But a data science major could also be somebody who is primarily a social scientist, but someone who is going to make decisions based on vast amounts of data. And hence, they have to understand how these processes work. So that social scientist will have some parts of the CS curriculum and some parts of the stats curriculum but will also have a domain emphasis that allows them to apply this knowledge to whatever area they find interest in.
DC: People often associate data science with buzzwords such as “machine learning.” What are your thoughts on this?
AA: Methods for statistical inference have always had the same base, but they have continued to evolve. Starting from something like basic linear regression models to more complex ones today, statistics has always been about trying to make sense of data in the best and most appropriate way possible. I think every generation or time has its own buzzwords, but the foundations behind these always remain the same.
DC: Do you think that being a “division” instead of a “department” disadvantages the data science program in any manner?
AA: Data science being a division instead of a department is actually advantageous because it is a combination of applications in so many fields. Being a division is beneficial since it allows for increased collaborations between professors and students from across the traditional departments.
DC: What expectations would you have for someone graduating with a degree in data science? Any parting words?
AA: Understand the data, have respect for where the data has come from, have awareness about biases within the data, and truly understand the ethical implications that the power to analyze data provides. Berkeley students no doubt have the caliber to achieve all of the above.
Contact Arth at [email protected].