Borse, Chetan
Interactive Document Retrieval
1 online resource (81 pages) : PDF
2017
University of North Carolina at Charlotte
Traditional Information Retrieval systems present with vast, relevant information as a response to the user search query, which usually consists of different semantic groups. It is a tedious task to look through all the retrieved information, and the user is mostly interested in the post-retrieved documents belonging to one or the other underlying semantic group.This problem motivated the researchers to provide an Interactive Document Retrieval system to narrow down the search and quickly locate the information. The notion is to identify the semantic groups of documents by clustering the post-retrieved information and to provide the summary for each cluster. In this research, we propose a new approach for document clustering and multi-document summarization. Our new document clustering approach clusters the post-retrieved documents into semantic space of concepts using the document embedding. The document embedding is obtained by the Doce2Vec training on the conceptualized document collection. The proposed approach improves the performance of the document clustering approximately by 6% when compared with the state-of-the-art techniques by considering F-measure. The proposed multi-document summarization technique extracts sentences from the document collection based on the highest importance scores computed using the Lexical Centrality principle. For power iteration, our algorithm uses the sentence embeddings obtained with the PV-DM model. This technique improves the multi-document summarization accuracy nearly by 4% as measured in Rouge-1 metrics. Thus, our new approaches improve the Interactive Document Retrieval framework to the next level.
masters theses
Computer science
M.S.
Document ClusteringDocument EmbeddingInformation RetrievalInteractive SearchSearch Engine
Computer Science
Zadrozny, Wlodek
Ras, ZbigniewShaikh, Samira
Thesis (M.S.)--University of North Carolina at Charlotte, 2017.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.
Borse_uncc_0694N_11579
http://hdl.handle.net/20.500.13093/etd:143