Haley Johnson

In my "professional life," I have been a data scientist, a public health communications intern, a researcher at a voter engagement non profit, a middle school swim coach, and (my favorite job) a barista.

These are mostly class projects from undergrad and grad school and personal projects

Data Science

Master's degree capstone project. Developed novel methods to estimate public opinion using small, non-representative samples, which is the norm in the polling industry. We developed our method using data from 2020 and then applied it to data to forecast the results of the 2024 election.

Github Repo | Project Website

Final project for EECS 592, Foundations of Artificial Intelligence. Replicated and extended Feng et al.'s methodology to induce political bias in large language model. Measured how different ideological bias impacted performance on a downstream misinformation detection task.

Github Repo | Slides | Project Website

Final project for SI 670, Machine Learning. Created a machine learning model to predict if a watermain will break in the next 3 years in Ann Arbor, Michigan. These predictions can help the city do preventitive maintenace and mitigate infastrucure failures

GitHub Repo

Developed a semantic image segmentation model to identify advertisements. Used edge detection to extract ads and optical character recongition to examine their content

Capstone project for my undergraduate degree, in parternship with Microsoft Research

GitHub Repo

Project for SI 649, Data Visualization, creating two pieces of data journalism about disability to accompany this article

View the interactive visualization here and the static visualization here

GitHub Repo

Final project for SI 370, Data Exploration, examining the similarities between genres of over 70k GoodReads based on the book's description. Used natural langage processing techniques and network analysis to create clusters of similar genres

GitHub Repos

Developed a classifier to detect tweets that express climate denial. Analyzed the performance of BERT embeddings, GloVe embeddings, and a simple bag of words model, as well as different neutral network architectures and techniques for natural language processing

GitHub Repo

Built a relational database of over 200 trails in U.S. National Parks using data from All Trails. Utilized Python to manipulate annd normalize data before inserting into SQL database

GitHub Repo

Teaching

Graduate Student Instructor

Undergraduate Teaching Assistant