This is my dissertation, which is composed of three empirical chapters. The first chapter examines whether police agency racial diversity is related to black and white criminals' relative arrest rates. That is, can the over-representation of white police officers partially explain racial disparities in arrest rates?
I combine 2013 National Incident Based Reporting System (NIBRS) data on reported criminals) with law enforcement data, zip code-level demographic data, and county-level presedential election data to create a novel dataset. Please see my Github page for all the scraping, cleaning, and analysis code.
The plots below show black and white criminal offenders' likelihood of arrest according to police agency racial diversity. Each dot is a police agency's arrest rate for black or white criminals. Please click the buttons on the left to see plots for each of the nine offenses I examine. The x-axis is police agencies' percentage of officers who are white, while the y-axis is the percentage of reported criminal offenders who are arrested in the same year (2013).
In order to control for a variety of factors that might be related to both police diversity and offenders' likelihood of arrest, I estimate several types of statistical models. Because the data are hierarchical (individuals commit crimes and are arrested within agencies), I primarily use multilevel logistic regression models predicting individual criminals' likelihood of arrest. Please see the full dissertation for a more detailed discussion.
In the summer of 2016, I was a consulting data scientist at Applied Research Works, Inc., which is building a healthcare data analysis tool for doctors and insurance companies. Together with Dr. Rupinder Singh I wrote a research paper on fee-for-service Medicare costs, which I submitted to the journal INQUIRY. I also made the map below to visualize my findings.
I have a few pet projects at various stages of completion. One is Transcribr, a web app that lets you convert audio (interviews) to text. A second project is the Yelp Dataset Challenge, for which I wrote a short paper on topics mentioned in restaurant reviews.
Another fun project is a NetLogo model I made to explore gender dynamics in a hypothetical industry. It shows how moderate levels of homophily and incoming gender disparities can impact company structure and employee outcomes. Please play around with it here or download the model.
I scraped approximately 200K Yelp.com reviews of Mexican restaurants for a project with Dr. Tomás Jiménez and Anna Boch. I also used Yelp's Academic Dataset (Round 8), which includes data from six major US cities. I made an interactive map to show the geographic distribution of restaurants and their reviews. I used regular expressions to code reviews' themes, such as whether they mention the ethnicity of the food or its authenticity. We use these data to understand how different forms of assimilation relate to cultural (food-related) assimilation. Please see this project's website and Github page.
One interesting finding was that feelings of warmth better distinguish political identities than do opinions on actual "political issues." For instance, Democrats and Republicans are more different on the top two principal components related to warmth statements than for the largest two components for political issues. However, both sets of principal components are poor predictors of identifying as an Independent, as Independents are similar to both Democrats and Republicans.
Together with Dr. Michael Rosenfeld and Taylor Orth, I have combined eight waves of the National Survey of Family Growth (NSFG). Using responses from 1972-2013, we have created a panel dataset of respondents' family, educational, and marital histories. We are currently conducting analyses to determine how predictors of marital dissolution (divorce or separation) have changed from the 1960s to the present day.
I took a class on network analysis (CS 224W) and conducted a project asking which types of crimes are often committed simultaneously. Together with Dustin Fink, I constructed and examined a network of arrest charges, with weights representing the number of people charged with both crimes at the same time. We improved the SimRank algorithm to efficiently compute charges' similarity to each other and wrote this paper.
I also made the network visualization at the top of this page, in which edges represent the Chi-squared residuals of charges' arrest co-occurrences. That is, edges are weighted by the difference between actual and expected co-occurrences, divided by the square root of expected co-occurrences. You can see that people tend to be arrested for similar crimes together, such as gambling and betting. In addition, certain edges span clusters in understandable ways, such as weapons to/from drug offenses and kidnapping to/from forcible rape.
Together with Drs. Cristobal Young and Charles Varner, I worked with California tax payer data from 1987-2012. With over 430 million person-years of tax data, the project will answer several pressing questions, such as whether the tax on millionaires introduced in 2004 caused them to migrate out of California.
Another potential application of these data is to look at residential segregation by income. To do so, made the figures below, which show zip codes' average incomes from 1995-2010.
I worked on a project with Dr. Young to determine how tax rates influence extremely wealthy individuals' choice of residence. I scraped Forbes.com for data on the 400 richest Americans, as well as on the world's millionaires. We will likely model migration using a discrete-time event-history (longitudinal logistic regression) model, or a gravity model.
In addition to the projects described above, I have a few "dormant" projects. One is a social psychological experiment that I conducted with the Laboratory for the Study of American Values. Using a nationally representative panel (YouGov), I manipulated the legality of two behaviors and measured respondents' attitudes towards the behavior. The findings suggest that illegality may increase perceived immorality, but further experiments are needed to eliminate several alternative hypotheses.
Another study uses Twitter data as a measure of public opinion. I used the Twitter Stream API to gather tweets mentioning LGBT-related words before and after the Obergefell Supreme Court decision legalizing gay marriage nationwide. I hope to examine changes in the discussion of LGBT people and their rights resulting from this historical legal decision.
© All Rights Reserved | Design by W3layouts