Files
Abstract
Analysts and domain experts in various fields rely on collecting data about their subjects to understand and predict their behavior. Characterizing and modeling human behavior requires analyzing extensive amounts of data from heterogeneous sources, which is a challenging task for researchers to achieve when using traditional methods. Social media platforms have been used in social sciences and different industries to understand their subjects in online settings. The advantage of the online setting is the ease of accessing large amounts of data, which solves the problem of data availability that occurs in offline settings. However, the data collected from social media is often messy and noisy.Therefore, many visual analytics (VA) tools are built for assisting domain experts to overcome those challenges efficiently. In this dissertation, I show how VA systems can leverage data to improve two major types of analysis tasks, which enhance discovering users’ behavior on social media. Both analysis tasks are related to the process of inferring the user categories, which are predefined by the domain expert. I illustrate the usability of VA for enhancing these tasks by applying the same research questions on different applications. The first analysis task involves understanding the connection between the social media user’s behavior and demographics. The second task involves the labeling of the social media users themselves according to the expert’s observations of their behavior. The VA systems characterize the users’ behavior through a suite of multiple coordinated views coupled with predictive models. These models are based on the textual information derived from their posts.The first application, DemographicVis, supports the understanding of the connection between the user’s demographic information and user-generated content. My approach in this application allows domain experts to make sense of the connection between categorical data, which is the users’ demographics, and textual data, which is their posts. This connection shows the characteristics of different demographic groups in a transparent and exploratory manner. Users’ posts are utilized to model and comprehend the demographic groups with the features that best characterize each group. The interactive interface of DemographicVis also enables the exploration of the predictive power of various features. In the second application, I propose a VA system for domain experts to categorize and label Twitter users. This work was motivated to eliminate bots from social media datasets since they produce noise that impedes the analysis. I address this challenge by providing an interface that enables the communications experts to separate between bots and other types of users in an active learning setting. In this setting, the experts iterate between labeling the users and running predictive models, based on these labels, to enhance their decisions in future labeling rounds.