Twitter Analysis Replication
Nov-18
: In this lesson, we will replicate Twitter disaster analysis in a new context.
The goal of this lab is to replicate analyses of Twitter data in the context of natural hazards or another type of major event. Your analysis, therefore, may be very analogous to the Hurricane Dorian study, but must focus on a new case study and new set of data.
Expectations
- Complete a GitHub repository / research compendium for a replication study, including a report
- The report discussion should compare your results to our Hurricane Dorian study and Wang et al (2016) and respond to any relevant concerns of Crawford and Finn (2014).
- Synthesize exciting findings in a blog post, linking to the research compendium
Recommendations
- Modify the
OR-Dorian
repository to query your own set of data from Twitter.
- You may modify both the search terms and the geographic extent of the search.
- Remember that you may have to let the query run for a few hours if there are a high number of results.
- This type of analysis is highly dependent on trending Twitter activity over the past week. Create a query that will result in many thousands of tweets in the United States.
- You may also want a second query to establish a baseline of normal twitter activity not related to your search. This could be an identical query that simply removes the search term constraints, or an identical query that finds the inverse of the search terms (i.e. all tweets that do not contain any of the keyword search terms). This second query will enable you to calculate a normalized tweet difference index.
- Be cautious when mapping results from Getis Ord G*: you may not find all categories of significance, and this may require adjusting your map classification.
- Many researchers suggest using Bonferroni correction to set more conservative critical values in the context of multiple hypothesis tests.
- There are opportunities to improve the analysis through:
- better social network analysis, perhaps by including retweets
- filter content by region, time, and/or scale to refine the semantic and network analysis
Hurricane Ida Replication
- If you are interested in analyzing Hurricane Ida data, the code for searches is available here (currently a private repository for Middlebury students) and the data is available here (Middlebury students only).
- This disaster unfolded over a long period of time, requiring multiple Twitter API searches. The searches must be combined without duplicating Tweets, and checking if there are gaps.
- Retweets are included
Saving graphs
Main Page