PROBLEM: Data science is an interdisciplinary field restricted greatly to students and faculty who have extensive backgrounds in computer science and engineering.
USERS: Both undergraduate and graduate students at UC Berkeley who are uncomfortable with applying data science in their fields or courses. We also serve to create comprehensive guides to faculty who would like to incorporate data science methods in their curriculum or research.
CONSTRAINTS: Team size was only around 4 for each. Most development processes ran between 8-12 weeks. Limited to 2-3 meetings per client.
DESIGN VALIDATION: Feedback forms from students, researchers, and professors. Pivots included a longer introduction to packages such as pandas and numpy and restrictions on aesthetics of visualizations, especially in natural language processing models.
NEXT STEPS: Infiltrate more fields of study!
Here are some Modules I have led!
Sociology 130AC: Neighborhood Mapping
We mapped and visualized socioeconomic and demographic variation across East Bay census tracts, using crowdsourced student data. Students go out into neighborhoods and make qualitative observations, and then compares them to census data. Qualitative observations are mapped in the notebook to combine individual observations into an overall map. Students engage with the data that they have gathered, and can explore across student groups, socioeconomic categories, or geographic locations.



XRHET 1A: Moral Foundations Theory
We connected word use in political speeches to the Moral Foundations Theory. Then created statistical inferences and visualizations from this data to help students look for rhetoric differences between conservative and liberal presidential candidates. Students then engage with and critique data-driven methods themselves as rhetorical tools.

Cuneiform 102: Sumerian Text Analysis
We worked with an interesting data set, the Electronic Text Corpus of Sumerian Literature (ETCSL), which are texts translated from fragmented tablets as old as 6000 years. The techniques used were mostly text analysis such as k-means, hierarchical clustering and multidimensional scaling. These provide the ability to classify a newly translated text with past Sumerian literature as well as create interesting tree graphs and clusters.


EPS C20: Earthquakes in Your Backyard
Students select a recent earthquake in California, download data for nearby seismometers, and plot their seismograms. From there, they calculated each station’s peak amplitude and distance from the earthquake.

Here are some additional Modules I have worked on:
Gender Women’s Studies 131: Gender and Science
We utilized UC Berkeley’s ladder-rank faculty (tenure-track), salaries and gender data, Silicon Valley data compiled from EEO-1 reports of top tech firms and Bay Area census data. We created statistical analyses relating to gender, race, salary and position.


Psychology 167 AC: Implicit Bias and Social Outcomes
Students pick from a set of datasets on health outcomes, and a set of datasets on implicit bias, both at the county level for the entire US. They then merge the two datasets by census code and measure correlation and regression to see if there are interactions between biases and health outcomes.


Advisor: Eric Van Dusen PhD
Modules Coordinator: Keeley Takimoto

Honored to go to JupyterCon in New York, New York August 2018 with the Modules Team!