daily californian logo


Apply to The Daily Californian!

Democratization of data science and emergence of citizen scientists

article image



We're an independent student-run newspaper, and need your support to maintain our coverage.

MAY 26, 2017

The emerging trend of cheap, widely available analytical tools has led to a “democratization” of data science and the rise of the so-called citizen scientist. As more computational software and data science educational material, such as UC Berkeley’s own online data science lectures, becomes available to more people, citizen data scientists — “people … that may have some data skills … and (put) them to work exploring and analyzing data,” according to researcher Alexander Linden — play a larger role. These new data practitioners make data analysis faster, cheaper and often more accurate and can alter the average joe’s relationship to Big Data.

The new amateur scientists allow a greater number of people to work on a single project, increasing the accuracy of research and allowing citizen scientists to help society cope with the dearth of data scientists. Websites such as Kaggle allow this to happen by establishing communities for citizen data scientists to collaborate and compete (check out one of their competitions here). Students at UC Berkeley can explore Kaggle by taking the aptly titled Data Science for Kaggle Decal. UC Berkeley is not the only university aiding the rise of the citizen scientist. Schools like Carnegie Mellon and Cornell University have pioneered programs in data science, and, like UC Berkeley, have provided valuable data science content to the public.

Moreover, the more people involved in data science, according to one academic paper, the more analysis will be geared toward tangible results. The idea is that because everyday people are more likely to care less about theoretical issues, they can help research efforts to find solutions to real-world problems. If, for example, the government utilizes data to determine if a certain UC chancellor misused public funds, the students’ interest in the topic might just lead to coming to a conclusion faster than any nonpartial researcher would.

Besides searching for evidence of Dirks’ misuse of public funds, citizen science has other uses such as data collection. The Stardust Mission, a UC Berkeley-based NASA project, was a great example of using the global power of the internet to collect mass data. The main goal of the project was, “to collect samples of a comet and return them to Earth for laboratory analysis.” Using the data collected from the mission, NASA came to understand the conditions under which comets were borne, a pretty profound scientific discovery which emerged due to a scientifically active citizenry. Modern data collection projects (that you, the reader, should probably check out) are underway at websites like SciStarter, which provides people with opportunities to collect data on things from landslides to sound pollution.

In a general sense, citizen examination of data allows the public to better understand the narratives that data analysis reveals. Additionally, this increased knowledge dissemination gives normal individuals the power to form their own interpretations and draw their own conclusions about the nature of certain analytic results. In the global arena, computational tools can contribute to the growth and improvement of less developed communities.

It’s necessary to point out that the accessibility of data science software is symptomatic of a larger movement coined the “democratization of knowledge.” The egalitarian spread of ideas through the printing press in antiquity and through the internet in modern times has given the power of knowledge to a larger number of people, promoting educational equality and the ability to think analytically.

Contact Melany Dillon at [email protected].

MAY 26, 2017