Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models. However, even with these rapid advancements, the current generation of conversational agents suffers from three major problems: (i) long-term context modeling; (ii) producing informative and factually accurate responses; (iii) robust evaluation of the NLG systems. Our work tackles these three gaps: (i) To address the issue of long-term context modeling, we present a novel end-to-end approach inspired by neurocognitive memory processes. We also implement a novel action selection mechanism that helps identify the relevant utterances containing salient information from long-term memory to working memory to better incorporate the context of the conversation during the generation process than state-of-the-art systems. (ii) To integrate knowledge into conversational agents, we also propose a dialog framework that incorporates both local knowledge as well as users' past dialogues to generate high-quality personalized conversations. Using our framework, we demonstrate that incorporating local knowledge can largely improve \emph{informativeness}, \emph{coherency} and \emph{realisticness} measures using human evaluations. However, even with these advancements, we find that knowledge grounded conversation models are prone to hallucinations. We address this issue by proposing a new dataset called ``CONV-FEVER'' to build a fact consistency detector. We show that our detector outperforms the current SOTA and can be integrated with existing models to increase the factual consistency of the knowledge grounded models. (iii) In the last part of this thesis, we focus on the aspect of the impact of experiment design in conversational AI systems by conducting two large-scale studies. In the first study, we compare four different experimental designs and study how each experiment design affects the quality of outputs obtained from the human evaluation. In the second study, we study the impact of cognitive biases particularly anchoring bias, and demonstrate its impact on human evaluation of NLG systems.

Details

PDF

Statistics

from
to
Export
Download Full History