Cornell Conversational Analysis Toolkit

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets.

The toolkit currently implements features for:


These datasets are included for ready use with the toolkit:

These datasets can be downloaded using the helper function. Alternatively you can access them directly here.

Data format

To use the toolkit with your own dataset, it needs to be in a standard json format.


This toolkit requires Python 3.

  1. Download the toolkit: pip3 install convokit
  2. Download Spacy's English model: python3 -m spacy download en

Alternatively, visit our Github Page to install from source.


See the example ipython notebooks linked above to familiarize yourself with how to use the different modules of the toolkit. The basic process is:

  1. import convokit into your python3 project.
  2. Load a corpus of conversations using corpus = convokit.Corpus(filename=...); use your own corpus or one of the ones provided with the toolkit.
  3. Use convokit functionality to extract features from the conversations, for example ps = convokit.PolitenessStrategies(corpus) extracts the politeness strategies used in all the conversations.
  4. Have fun analyzing coversations.


Documentation is hosted here.

The documentation is built with Sphinx (pip3 install sphinx). To build it yourself, navigate to doc/ and run make html.


Andrew Wang ( wrote the Coordination code and the respective example script, wrote the helper functions and designed the structure of the toolkit.

Ishaan Jhaveri ( refactored the Question Typology code and wrote the respective example scripts.

Jonathan Chang ( wrote the example script for Conversations Gone Awry.