Cornell Conversational Analysis Toolkit (ConvoKit) Documentation¶
This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. More information can be found at our website. The latest version is 3.0.1 (released Nov. 8, 2024).
Contents¶
- Datasets
- Conversations Gone Awry Dataset - Wikipedia version (CGA-WIKI)
- Conversations Gone Awry Dataset - Reddit CMV version (CGA-CMV)
- Cornell Movie-Dialogs Corpus
- CANDOR Corpus
- Parliament Question Time Corpus
- Wikipedia Talk Pages Corpus
- Tennis Interviews
- Reddit Corpus (all, by subreddit)
- Reddit Corpus (small)
- WikiConv Corpus
- Chromium Conversations Corpus
- Winning Arguments Corpus
- Coarse Discourse Corpus
- Persuasion For Good Corpus
- Intelligence Squared Debates Corpus
- Friends Corpus
- Spolin Corpus
- Switchboard Dialog Act Corpus
- Stanford Politeness Corpus (Wikipedia)
- Stanford Politeness Corpus (Stack Exchange)
- Deception in Diplomacy Corpus
- Group Affect and Performance (GAP) Corpus
- Supreme Court Oral Arguments Dataset
- Wikipedia Articles for Deletion Dataset
- CaSiNo Corpus
- NPR Interviews 2P Corpus
- Federal Open Market Committee Corpus
- FORA Corpus
- DeliData Corpus
- Examples