Utility Functions¶
-
convokit.util.
deprecation
(prev_name: str, new_name: str, stacklevel: int = 3)¶ Suppressable deprecation warning.
-
convokit.util.
download
(name: str, verbose: bool = True, data_dir: str = None, use_newest_version: bool = True, use_local: bool = False) → str¶ Use this to download (or use saved) convokit data by name.
- Parameters
name –
Which item to download. Currently supported:
- ”wiki-corpus”: Wikipedia Talk Page Conversations Corpus
A medium-size collection of conversations from Wikipedia editors’ talk pages. (see http://www.cs.cornell.edu/~cristian/Echoes_of_power.html)
- ”wikiconv-<year>”: Wikipedia Talk Page Conversations Corpus
Conversations data for the specified year.
- ”supreme-corpus”: Supreme Court Dialogs Corpus
A collection of conversations from the U.S. Supreme Court Oral Arguments. (see http://www.cs.cornell.edu/~cristian/Echoes_of_power.html)
- ”parliament-corpus”: UK Parliament Question-Answer Corpus
Parliamentary question periods from May 1979 to December 2016 (see http://www.cs.cornell.edu/~cristian/Asking_too_much.html)
- ”conversations-gone-awry-corpus”: Wiki Personal Attacks Corpus
Wikipedia talk page conversations that derail into personal attacks as labeled by crowdworkers (see http://www.cs.cornell.edu/~cristian/Conversations_gone_awry.html)
- ”conversations-gone-awry-cmv-corpus”
Discussion threads on the subreddit ChangeMyView (CMV) that derail into rule-violating behavior (see http://www.cs.cornell.edu/~cristian/Conversations_gone_awry.html)
- ”movie-corpus”: Cornell Movie-Dialogs Corpus
A large metadata-rich collection of fictional conversations extracted from raw movie scripts. (see https://www.cs.cornell.edu/~cristian/Chameleons_in_imagined_conversations.html)
- ”tennis-corpus”: Tennis post-match press conferences transcripts
Transcripts for tennis singles post-match press conferences for major tournaments between 2007 to 2015 (see http://www.cs.cornell.edu/~liye/tennis.html)
- ”reddit-corpus-small”: Reddit Corpus (sampled):
A sample from 100 highly-active subreddits
- ”subreddit-<subreddit-name>”: Subreddit Corpus
A corpus made from the given subreddit
- ”friends-corpus”: Friends TV show Corpus
A collection of all the conversations that occurred over 10 seasons of Friends, a popular American TV sitcom that ran in the 1990s.
- ”switchboard-corpus”: Switchboard Dialog Act Corpus
A collection of 1,155 five-minute telephone conversations between two participants,
annotated with speech act tags.
- ”persuasionforgood-corpus”: Persuasion For Good Corpus
A collection of online conversations where a persuader tries to convince a persuadee to donate to charity.
- ”iq2-corpus”: Intelligence Squared Debates Corpus
Transcripts of debates held as part of Intelligence Squared Debates.
- ”diplomacy-corpus”: Deception in Diplomacy Corpus
Dataset with intended and perceived deception labels in the negotiation-based game Diplomacy.
- ”reddit-coarse-discourse-corpus”: Coarse Discourse Sequence Corpus
Reddit dataset with utterances containing discourse act labels.
- ”chromium-corpus”: Chromium Conversations Corpus
A collection of almost 1.5 million conversations and 2.8 million comments posted by developers reviewing proposed code changes in the Chromium project.
- ”wikipedia-politeness-corpus”: Wikipedia Politeness Corpus
A corpus of politeness annotations on requests from Wikipedia talk pages.
- ”stack-exchange-politeness-corpus”: Stack Exchange Politeness Corpus
A corpus of politeness annotations on requests from stack exchange.
verbose – Print checkpoint statements for download
data_dir – Output path of downloaded file (default: ~/.convokit)
use_newest_version – Re-download if new version is found
use_local – if True, use the local version of corpus if it exists (regardless of whether a newer version exists)
- Returns
The path to the downloaded item.
-
convokit.util.
download_local
(name: str, data_dir: str)¶ Get path to a previously-downloaded local version of the corpus (which may be an older version).
- Parameters
name – name of Corpus
- Returns
string path to local Corpus
-
convokit.util.
subreddit_in_grouping
(subreddit: str, grouping_key: str) → bool¶ - Parameters
subreddit – subreddit name
grouping_key – example: “askreddit~-~blackburn”
- Returns
if string is within the grouping range
-
convokit.util.
warn
(text: str)¶ Pre-pends a red-colored ‘WARNING: ‘ to [text]. This is a printed warning and cannot be suppressed.
- Parameters
text – Warning message
- Returns
‘WARNING: [text]’