ConvoKit: Conversational Analysis Toolkit

PyPI version Python versions Code style: black License

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is 4.1.0 (released Mar. 10, 2026); follow the project on GitHub to keep track of updates.

Documentation

Documentation is hosted here.

If you are new to ConvoKit, great places to get started are:

For an overview, watch our SIGDIAL talk introducing the toolkit:

Community & Support

Join our Discord community to:

  • Get help with installation and usage

  • Stay updated on the latest releases

  • Discuss progress, features, and issues

  • Share your work and connect with others

Citation

If you use ConvoKit code or datasets, please acknowledge the respective components in addition to:

Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil. 2020. “ConvoKit: A Toolkit for the Analysis of Conversations”. Proceedings of SIGDIAL.

Funding

ConvoKit is funded in part by the U.S. National Science Foundation under Grant No. IIS-1750615 (CAREER). Any opinions, findings, and conclusions in this work are those of the author(s) and do not necessarily reflect the views of Cornell University or the National Science Foundation.