thawani@usc.edu
My research helps language models tokenize and represent text better. Here are my most recent and representative publications:
Other links to explore publications | Google Scholar | Semantic Scholar | DBLP | ORCID |
Link | Thawani A., Ghanekar S., Kumar D., Pujara J. Does Subword Vocabulary hold back Machine Translation?. (submitted 2024). |
Anthology | Arxiv | Poster | Code | Thawani A., Ghanekar S., Zhu X., Pujara J. Learn Your Tokens: Word-Pooled Tokenization for Language Modeling. EMNLP 2023 Findings. |
Arxiv | Presentation | Thawani A., Pujara J., Kalyan A. Estimating Numbers without Regression. Negative Insights workshop at EACL 2023. |
Anthology | Slides | Reviews | Code | Kumar D*, Thawani A*. BPE with N-Grams and Skip-Grams. Negative Insights workshop at ACL 2022. (*equal contribution) |
Anthology | Slides | Video | Poster | Thread | Code | ACL21 Reviews | Thawani A., et al. Numeracy enhances Literacy in Language Models. EMNLP (2021). |
Coverage | Anthology | Arxiv | Slides | Video | Thread | Thawani A. et al. Representing Numbers in NLP: a Survey and a Vision. NAACL (2021). |
Slides | Thawani A. et al. Entity Linking to Knowledge Graphs to Infer Column Types and Properties. SemTab @ ISWC (2019). |
Code | Thawani A. et al. Are Online Reviews of Physicians Biased Against Female Providers? MLHC (2019). |
Anthology | Poster | Code | Thawani, Avijit et al. SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings. RepEval @ NAACL (2019). |
Anthology | Singh, A.K., Thawani A., Panchal M., Gupta A., & McAuley J. IJCNLP-2017 Task 3: Review Opinion Diversification (RevOpiD-2017). IJCNLP (2017). |
Singh A.K., Thawani A., Gupta A., & Mundotiya R.K. Evaluating Opinion Summarization in Ranking. AIRS (2017). |