Avijit Thawani



View My GitHub Profile

🏠 | 🔬 Research | 🛠️ Build | ✈️ Fun

My research helps language models tokenize and represent text better. Here are my most recent and representative publications:

Other links to explore publications Google Scholar Semantic Scholar DBLP ORCID

Link Thawani A., Ghanekar S., Kumar D., Pujara J. Does Subword Vocabulary hold back Machine Translation?. (submitted 2024).
Anthology Arxiv Poster Code Thawani A., Ghanekar S., Zhu X., Pujara J. Learn Your Tokens: Word-Pooled Tokenization for Language Modeling. EMNLP 2023 Findings.
Arxiv Presentation Thawani A., Pujara J., Kalyan A. Estimating Numbers without Regression. Negative Insights workshop at EACL 2023.
Anthology Slides Reviews Code Kumar D*, Thawani A*. BPE with N-Grams and Skip-Grams. Negative Insights workshop at ACL 2022. (*equal contribution)
Anthology Slides Video Poster Thread Code ACL21 Reviews Thawani A., et al. Numeracy enhances Literacy in Language Models. EMNLP (2021).
Coverage Anthology Arxiv Slides Video Thread Thawani A. et al. Representing Numbers in NLP: a Survey and a Vision. NAACL (2021).
PDF Slides Thawani A. et al. Entity Linking to Knowledge Graphs to Infer Column Types and Properties. SemTab @ ISWC (2019).
PDF Code Thawani A. et al. Are Online Reviews of Physicians Biased Against Female Providers? MLHC (2019).
Anthology Poster Code Thawani, Avijit et al. SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings. RepEval @ NAACL (2019).
Anthology Singh, A.K., Thawani A., Panchal M., Gupta A., & McAuley J. IJCNLP-2017 Task 3: Review Opinion Diversification (RevOpiD-2017). IJCNLP (2017).
PDF Singh A.K., Thawani A., Gupta A., & Mundotiya R.K. Evaluating Opinion Summarization in Ranking. AIRS (2017).