Files
Abstract
Data mining techniques gained acceptance as a clear means of finding information in data. In the past, these techniques have been effectively applied to discover patterns, find correlations, extract information from unstructured data. Often overlooked is the major fact that data mining techniques were effectively applied only on scientific datasets like finance, healthcare, physics, and chemistry. We identified a study on the U.S. State Supreme Courts is significantly constrained by the lack of available data. To find an antidote to this problem, I have conducted research to produce an original dataset for every state supreme court ruling from 1953 through 2014. We have utilized dynamic textual analysis to search through the case files of thousands of state supreme court decisions and extract critical information on each case. We present trends analysis on the case distribution across states, the month of submitting the case, regional reporter, and legal issues being heard in front of the court. Following the synthesis of the dataset, we prepared a vector representation of the cases having similar characteristics based on the text in the overview section of the case using neural network architecture. Meanwhile, we used a generative statistical model to cluster the 2.1 Million cases extracted into 17 bins based on the word presence the case text. A validation study conducted on researchers in political science proves that all the three datasets 1. Original Supreme Court Dataset 2. Vector Representation of CaseSummary 3. Topic Clusters of the Cases Summary are highly structured, extremely meaningful. Hence, these datasets will offer scholars enormous possibilities to expand the knowledge of judicial politics in the American States.