🏠 | 🔬 Research | 🛠️ Build | ✈️ Fun

My research helps language models tokenize and represent text better. Here are my most recent and representative publications:

Other links to explore publications

Google Scholar

Semantic Scholar

DBLP

ORCID

Link	Thawani A., Ghanekar S., Kumar D., Pujara J. Does Subword Vocabulary hold back Machine Translation?. (submitted 2024).

Thawani A., Ghanekar S., Zhu X., Pujara J. Learn Your Tokens: Word-Pooled Tokenization for Language Modeling. EMNLP 2023 Findings.

Arxiv

Presentation

Thawani A., Pujara J., Kalyan A. Estimating Numbers without Regression. Negative Insights workshop at EACL 2023.

Kumar D*, Thawani A*. BPE with N-Grams and Skip-Grams. Negative Insights workshop at ACL 2022. (*equal contribution)

Thawani A., et al. Numeracy enhances Literacy in Language Models. EMNLP (2021).

Thawani A. et al. Representing Numbers in NLP: a Survey and a Vision. NAACL (2021).

PDF

Slides

Thawani A. et al. Entity Linking to Knowledge Graphs to Infer Column Types and Properties. SemTab @ ISWC (2019).

PDF

Code

Thawani A. et al. Are Online Reviews of Physicians Biased Against Female Providers? MLHC (2019).

Anthology

Poster

Code

Thawani, Avijit et al. SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings. RepEval @ NAACL (2019).

Anthology

Singh, A.K., Thawani A., Panchal M., Gupta A., & McAuley J. IJCNLP-2017 Task 3: Review Opinion Diversification (RevOpiD-2017). IJCNLP (2017).

PDF	Singh A.K., Thawani A., Gupta A., & Mundotiya R.K. Evaluating Opinion Summarization in Ranking. AIRS (2017).