Highlights from CoNLL and EMNLP 2019

CoNLL and EMNLP, two top-tier natural language processing conferences, were held in Hong Kong last month. A large contingent of the Square AI team, myself included, attended and our fantastic intern, Justin Dieter, presented our work on a new contextual language generation task: mimic rephrasals. Despite being a regular conference attendee, I was surprised by the sheer quantity and quality of innovative ideas presented at the conference: a true testament to how fast the field is moving. It’s impossible to cover everything that happened, but in this post I’ve tried to capture a sampling of the ideas I found most exciting in the sessions I attended. ...

December 3, 2019 · 22 min · 4526 words · Arun Tejasvi Chaganty

Why we need human evaluation.

$$ \newcommand{\sX}{\mathcal{X}} \newcommand{\sZ}{\mathcal{Z}} \newcommand{\muh}{\hat{\mu}} \newcommand{\mub}{\muh_{\text{mean}}} \newcommand{\mucv}{\muh_{\text{cv}}} \newcommand{\E}{\mathbb{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} $$We’re witnessing an exciting boom in the subfield of natural language generation (NLG), with more than 150 related papers published at ACL, NAACL and EMNLP in just the last year! These papers cover a range of tasks including abstractive summarization (Nallapati et al., 2016), open-response question answering (Nguyen et al., 2016; Kočisky et al., 2017), image captioning (Lin et al., 2014), and open-domain dialogue (Lowe et al., 2017b). Unfortunately, it’s incredibly hard to compare these different methods in a meaningful way. While most papers report automatic evaluations using metrics like BLEU or ROUGE, these metrics have consistently been shown to poorly correlate with human judgment on fluency, redundancy, overall quality, etc. On the other hand, only a small fraction of these papers actually conduct a thorough human evaluation. ...

July 10, 2018 · 10 min · 2080 words · Arun Tejasvi Chaganty

Topic Models, Gaussian Integrals, and Getting Scooped

Today’s post is a brief overview of one of my research projects; one that’s unfortunately already been done before. It is going to be a little mathematical, but I’ll try and provide sufficient intuitive reasoning that you don’t have to really be at one with the mathematics. Topic Models So, these days, every Tom, Dick and Harry can crawl the web or some other incomprehensibly large source of news, views and… I’d rather not continue. What could you do with this kind of data? Well, you could use it to classify/cluster web-pages for use in search engines like DuckDuckGo, or for recommending books, etc. While analysing these documents text, single words on their own like “files” could have several different connotations, depending on whether you’re talking about computers, bureaucracy or breaking out of a prison. However, you might be able to disambiguate which “topic” is being talked about depending on other words you saw like “screen”, “touchpad” and “Steve Jobs”. ...

March 25, 2012 · 9 min · 1774 words · Arun Tejasvi Chaganty