Haley Johnson

In my "professional life," I have been a data scientist, a public health communications intern, a researcher at a voter engagement non profit, a middle school swim coach, and (my favorite job) a barista

Data Science

Master's degree capstone project. Developed novel methods to estimate public opinion using small, non-representative samples, which is the norm in the polling industry. We developed our method using data from 2020 and then applied it to data to forecast the results of the 2024 election.

Github Repo | Project Website

Final project for EECS 592, Foundations of Artificial Intelligence. Replicated and extended Feng et al.'s methodology to induce political bias in large language model. Measured how different ideological bias impacted performance on a downstream misinformation detection task.

Github Repo | Slides

Final project for SI 670, Machine Learning. Created a machine learning model to predict if a watermain will break in the next 3 years in Ann Arbor, Michigan. These predictions can help the city do preventitive maintenace and mitigate infastrucure failures

GitHub Repo

Developed a semantic image segmentation model to identify advertisements. Used edge detection to extract ads and optical character recongition to examine their content

Capstone project for my undergraduate degree, in parternship with Microsoft Research

GitHub Repo

Project for SI 649, Data Visualization, creating two pieces of data journalism about disability to accompany this article

View the interactive visualization here and the static visualization here

GitHub Repo

Final project for SI 370, Data Exploration, examining the similarities between genres of over 70k GoodReads based on the book's description. Used natural langage processing techniques and network analysis to create clusters of similar genres

GitHub Repos

Developed a classifier to detect tweets that express climate denial. Analyzed the performance of BERT embeddings, GloVe embeddings, and a simple bag of words model, as well as different neutral network architectures and techniques for natural language processing

GitHub Repo

Built a relational database of over 200 trails in U.S. National Parks using data from All Trails. Utilized Python to manipulate annd normalize data before inserting into SQL database

GitHub Repo

Teaching

I was the head graduate student instructor for intermediate python programming in Winter 2024. Previously, I've taught data science for public policy (Fall 2023), intro to python (Fall 2021) and intro to network science (Fall 2022).

Some of my teaching materials from data science for public policy (focused on coding skills) are avaliable here