Files
Abstract
Neural seq2seq models are widely used to generate dialogue using variations of RNN Encoder-Decoder architecture with maximum likelihood estimation as objective function. Advances in this architecture include introducing reinforcement learning to tune network parameters. This research focuses on generating affective responses in an attempt to create conversational agent that cater to emotional context of conversation.We propose a novel approach using bidirectional RNN Encoder-Decoder seq2seq model with attention mechanism and maximum mutual information as initial objective function combined with Reinforcement Learning (RL). We train our own word2vec embeddings with appended valence, arousal and dominance scores as input. We use an AlphaGo-style strategy by initializing RL system with general response policy learnt from our seq2seq model, which is then tuned using policy gradient method. The internal rewards are Ease of Answering, Semantic Coherence and Emotional Intelligence, incorporated by minimizing affective dissonance between the source and the generated response. We use a two-part training scheme, where we train a separate external reward analyzer to predict the reward using human feedback and then use RL system on predicted rewards to maximize the expected rewards (both internal and external). We use two different datasets namely Cornell Movie Dialog Corpus and Yelp Restaurant Review to train two different models and the affective responses generated by our models are tabulated. We use the standard evaluation metrics like BLEU, Perplexity and ROUGE-L along with the human evaluation to evaluate our models.