ConvoKit: Conversational Analysis Toolkit

PyPI version Python versions License

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is 4.1.0 (released Mar. 10, 2026); follow the project on GitHub to keep track of updates.

Documentation

Documentation is hosted here.

If you are new to ConvoKit, great places to get started are:

  • The Core Concepts tutorial for an overview of the ConvoKit “philosophy” and object model

  • The High-level tutorial for a walkthrough of how to import ConvoKit into your project, load a Corpus, and use ConvoKit functions

For an overview, watch our SIGDIAL talk introducing the toolkit:

Community & Support

Join our Discord community to:

  • Get help with installation and usage

  • Stay updated on the latest releases

  • Discuss progress, features, and issues

  • Share your work and connect with others

Citation

If you use the code or datasets distributed with ConvoKit please acknowledge the work tied to the respective component (indicated in the documentation) in addition to:

Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil. 2020. “ConvoKit: A Toolkit for the Analysis of Conversations”. Proceedings of SIGDIAL.

Funding

ConvoKit is funded in part by the U.S. National Science Foundation under Grant No. IIS-1750615 (CAREER). Any opinions, findings, and conclusions in this work are those of the author(s) and do not necessarily reflect the views of Cornell University or the National Science Foundation.