Stanford EE Computer Systems Colloquium

4:30 PM, Wednesday, March 13, 2019
Shriram Center for Bioengineering and Chemical Engineering Room 104
http://ee380.stanford.edu

Natural Language Processing for Production-Level Conversational Interfaces

Karthik Raghunathan and Arushi Raghuvanshi
Cisco Systems

About the talk:

Conversational applications often are over-hyped and under perform. While there's been significant progress in Natural Language Understanding (NLU) in academia and a huge growing market for voice based technologies, NLU performance significantly drops when you introduce language with typos or other errors, uncommon vocabulary, and more complex requests. This talk will cover how to build a production quality conversational app that performs well in a real world setting.

We will demonstrate an end-to-end approach for consistently building conversational interfaces with production-level accuracies that has proven to work well for a number of applications across diverse verticals. Building successful conversational interfaces involves choosing the right use case, collecting clean and relevant data, and breaking down the NLU problem into a series of solvable sub-tasks. All of today’s most widely used conversational services have been built using a similar hierarchical NLU pipeline of domain-intent-entity classification that has become an industry standard, which we will discuss in detail.

Our architecture further improves on this standard domain-intent-entity classification and dialogue management architecture by leveraging shallow semantic parsing. We observed that NLU systems for industry applications often require more structured representations of entity relations than provided by the standard hierarchy, yet without requiring full semantic or syntactic parses which are often inaccurate on real-world conversational data. We describe our approach and demonstrate how it improves the performance of conversational interfaces for non-trivial use cases.

We end the talk by discussing the additional challenges in building a voice assistant rather than a text-based chatbot. Large vocabulary domain-agnostic Automatic Speech Recognition (ASR) systems often mis-transcribe domain-specific words and phrases. Since these generic ASR systems are the first components of most voice assistants in production, building NLU systems that are robust to these errors can be a challenging task. We describe a few potential methods for handling ASR errors in the NLU pipeline, especially in the entity classification and resolution component which is most susceptible to poor performance from ASR errors.

After this talk, attendees will have a better appreciation for the challenges and nuances of building real-world NLU systems, as well as a high level understanding of the best practices and components needed to build their own production quality conversational assistant.

Slides:

Download the slides for this presentation in PDF format.

Video:

To access the live webcast of the talk (active at 16:28 of the day of the presentaton) and the archived version of the talk, use the URL SU-EE380-20190313. This is a first class reference and can be transmitted by email, Twitter, etc.

A URL referencing a YouTube view of the lecture will be posted HERE a week or so following the presentation.

About the Speaker

[speaker photo] Karthik Raghunathan is the Head of Machine Learning at Cisco's Webex Intelligence Group. Karthik used to be the Director of Research at MindMeld, a leading AI company that powered conversational interfaces for some of the world's largest retailers, media companies, government agencies and automotive manufacturers. MindMeld was acquired by Cisco in May 2017. Karthik has more than 10 years of combined experience working at reputed academic and industrial research labs on the problems of speech, natural language processing, and information retrieval. Prior to joining MindMeld, he was a Senior Scientist in the Microsoft AI & Research Group, where he worked on conversational interfaces such as the Cortana digital assistant and voice search on Bing and Xbox.

Karthik holds an MS in Computer Science with Distinction in Research in Natural Language Processing from Stanford University. He was co-advised by professors Daniel Jurafsky and Christopher Manning, and his graduate research focused on the problems of Coreference Resolution and Statistical Machine Translation. Karthik is a co-inventor on two US patents and has publications in leading AI conferences such as EMNLP, SIGIR and AAAI.

[speaker photo] Arushi Raghuvanshi is a Senior Machine Learning Engineer at Cisco through the acquisition of MindMeld, where she builds production level conversational interfaces. She has developed instrumental components of the core Natural Language Processing platform, drives the effort on active learning to improve models in production, and is leading new initiatives such as speaker identification. Prior to MindMeld, Arushi earned her Master's degree in Computer Science with an Artificial Intelligence specialization from Stanford University. She also holds a Bachelor's degree from Stanford in Computer Science with a secondary degree in Electrical Engineering. Her prior industry experience includes time working at Microsoft, Intel, Jaunt VR, and founding a startup backed by Pear Ventures and Lightspeed Ventures. Arushi has publications in leading conferences including EMNLP, IEEE WCCI, and IEEE ISMVL.