Date and Time: March 13, 2017, 3:30 pm – 5:15 pm
Location: CDM 209
Abstract: Researchers and practitioners use social media to extract actionable patterns about human behavior. However, the validity of these patterns hinges, in part, on leveraging a dataset that is representative of society. The data collected from social media is not always representative of the real world, and in many cases it is not even representative of social media itself. This talk will introduce ways in which the social data upon which inferences are drawn differ from the underlying populations and trends in the real world. Furthermore, I discuss the statistically significant differences between the data generated on social media and the social media data commonly used in research. These observations pave the way for the discovery and removal of bias within social media data. Next, I will introduce methodologies to clearly extract patterns from social media data by identifying and removing specific sources of bias. This has important implications for social media mining, namely that the behavioral patterns and insights they extract will be more representative of society. This will allow for more accurate measurements and findings from social media data by researchers and practitioners.
Bio: Fred Morstatter is a PhD candidate in computer science at Arizona State University in Tempe, Arizona. His research focuses on finding and removing biases that can skew research results from big social data. He has been a Visiting Scholar at Carnegie Mellon University as well as a Research Intern at Microsoft Research. He is the Principal Architect for TweetXplorer, a visual analytic system for Twitter data. A full list of publications can be found at www.fredmorstatter.com.
DePaul Analytics Group