Avijit Thawani



View My GitHub Profile

Home | Work | Fun | Papers

I’m Avijit Thawani, a Computer Science PhD student at USC. Friends (as if I have any) call me Avi. I work on Representation Learning within Natural Language Processing, with Jay Pujara at the Information Sciences Institute (ISI). I did my undergrad and masters in Computer Science at the Indian Institute of Technology (IIT BHU), Varanasi.

I’ve been fortunate to also be able to do research with a number of amazing mentors:

Feel free to contact me about my work, potential collaboration plans, or to discuss any ideas at: thawani@usc.edu Here are a few other pointers to knowing me: Twitter; Medium; LinkedIn; Resume.


June 2022: Interning at Amazon Lab126 with the Alexa Conversations team, loosely on the topic of compositional generalization.

May 2022: Presenting work with Dipesh Kumar, at the ACL 2022 Workshop on Negative Insights.

Sept 2021: I wrote my first opinion piece which stood first in a weekly contest by The Print: Will Panjshir become a Taiwan? Afghanistan’s story matches with China. Thanks to a course I took with Professor Joshua Goldstein for the idea!

Aug 2021: Our short paper was accepted to EMNLP 2021. We showed that Numeracy enhances Literacy in Language Models (or is it Foundation Models now)! TL;DR: Simple changes to number tokenization helps models predict words better.

July 2021: Wrapped up my internship with AI2, wrote a short story around AGI/Blockchain. I’m also learning how to make Chrome browser extensions - starting with https://blocksite.co/, using which would’ve otherwise costed me $11 per month! Here’s a free version for anyone: https://github.com/avi-jit/blocker.

June 2021: I’ll be attending NAACL 2021 and presenting our survey on Number Representations in NLP. I’m also excited to hear more about other awesome papers, such as those described in Sebastian Ruder’s NLP newsletter!

May 2021: We submitted two papers to EMNLP: one’s a revision of an ACL rejection and another’s a side project with Dipesh Kumar from IIT BHU. I’ve also begun my AI2 internship with Ashwin Kalyan as my mentor. Here’s my intro slide!

Apr 2021: Tragic month in India. In between arranging oxygen for dying relatives and myself recovering from Covid-19, I tried to visualize the scale of the Indian crisis for Americans to better comprehend it.

Meanwhile, our NAACL 2021 Survey on Numeracy in NLP featured in Sebastian Ruder’s NLP newsletter!

Mar 2021: Our survey on number representations was accepted to NAACL 2021. Here’s a preprint link and a short twitter thread describing the same!

Feb 2021: Volunteered to write a layperson article on human-AI trust for the ISI Communications team.

Jan 2021: Submitted a paper (link removed temporarily) to ACL 2021 on number representations in NLP.

Dec 2020: I’ll be interning with AI2’s Team Aristo in Summer 2021.

Nov 2020: Submitted a paper (link removed temporarily) to NAACL 2021 on number representations in NLP.

Oct 2020: My (ongoing) work on number representations was accepted at West Coast NLP 2020. Here is the 1-pg abstract (link removed temporarily). Looking forward to present on 30th October 2020.

Sept 2020: We have fundraised registration fees to sponsor four Indian undergrads’ attendance at EMNLP 2020. In other news, TG, Harsh, and I submitted a proposal to the government of India on identifying Indian vernacular NLP as an emerging technology. Update: Our proposal was unfortunately not selected, but we’d love to hear your feedback so here’s the link.

July 2020: At ISI’s Graduate Student Symposium GSS 2020, I presented my (ongoing) work on number representations (poster link removed temporarily) and frame semantics (slides; video).

June 2020: I’ll be attending MLSS 2020 and ACL 2020. I’ll present my (ongoing) work on number representations (video link removed temporarily) at the former. EDIT: Here’s a conference report by Dr Vered Shwartz on the latter.

April 2020: I’ve been selected to attend MLSS Tübingen: Machine Learning Summer School along with 179 more students (out of 1300+ applicants).

Oct 2019: We ranked third in the IBM sponsored Table-to-KG matching challenge at the International Semantic Web Conference (ISWC 2019) . Here’s the system description paper we wrote, and here are the slides. I also wrote a blog about my trip to ISWC.

Oct 2019: Selected as a volunteer for TechCrunch Disrupt SF 2019!

Sept 2019: Attended SoCalNLP 2019.

Sept 2019: I won a travel grant to attend WeCNLP 2019 at Facebook HQ, Menlo Park. The view up there is pretty amazing!

Aug 2019: Presenting at MLHC 2019, Michigan, a joint work with Byron Wallace on studying gender bias in online physician reviews.

July 2019: Attending SIGGRAPH 2019, Los Angeles.

June 2019: Attending ICML 2019, Long Beach.

June 2019: Presenting a poster at RepEval 2019 workshop colocated with NAACL, Minneapolis. Here’s a nice Github repo to get you started on our Word Association Task for Word Embeddings!

June 2019: Joined University of Southern California as a PhD student. I’ll be working with Pedro Szekely and Jay Pujara at the Center on Knowledge Graphs, Information Sciences Institute, Los Angeles. Looking forward to the DARPA Machine Commonsense project. I will be supported by the Annenberg Fellowship!

May 2019: Defended my Master’s thesis on Opinion Mining with word and contextualized embeddings. Bidding adieu to a great five years at IIT BHU :)

May 2019: Paper accepted (co-authors: Biplav Srivastava and Anil Kumar Singh) at the RepEval 2019 workshop. See you at NAACL, June 2-7, Minneapolis!

Jan 2019: Accepted into the Computer Science PhD programs at University of Southern California, Los Angeles and Northeastern University, Boston.

Dec 2018: Three amazing job offers from Samsung, Myntra, and Headout.

21st April 2018: My long short film Stopping by Woods is now on YouTube (EDIT: over 50,000 views). Do watch and hit like if you like!

March 2018: In the summer of 2018, I’ll be heading to Northeastern University for an internship under Dr. Byron Wallace’s guidance. See you in Boston!

Feb 2018: We’re done with the shooting of my upcoming short film (tentatively) titled Stopping by woods. So excited to begin editing as soon as my mid semesters end!

Dec 2017: We’re organising the 2nd workshop on Review Opinion Diversification at ACM Hypertext (9-12 July, 2018). See you in Baltimore!