Stanford Politeness Corpus (Stack Exchange) ==================================================== A collection of requests from Stack Exchange, annotated with politeness (6,603 utteranecs). Distributed together with: A Computational Approach to Politeness with Application to Social Factors. Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts. ACL, 2013. Dataset details --------------- Utterance-level information ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each utterance corresponds to a Stack Exchange request. For each utterance, we provide: * id: row index of the request given in the original data release. * speaker: the author of the utterance. * conversation_id: id of the first utterance in the conversation this utterance belongs to, which in this case is the id of the utterance itself * reply_to: None. In this dataset, each request is seen as a full conversation, and thus all utterances are at the 'root' of the conversations * timestamp: "NOT_RECORDED". * text: textual content of the utterance. Metadata for each utterance is inherited from the general CMV corpus: * Normalized Score: Normalized politeness score computed based on annotations. * Binary: A binarized politeness label where 1="polite", 0="neutral", -1 = "impolite". * Annotations: the original annotations from Amazon Mechanical Turkers for the given utterance. Ratings are on a 1-25 scale. * parsed: dependency-parsed version of the utterance text Usage ----- To download directly with ConvoKit: >>> from convokit import Corpus, download >>> corpus = Corpus(filename=download("stack-exchange-politeness-corpus")) For some quick stats: >>> len(corpus.get_utterance_ids()) 6603 Data License ^^^^^^^^^^^^ ConvoKit's Stanford Politeness Corpus is governed by the `CC BY license v4.0 `_. Copyright (C) 2017-2020 The ConvoKit Developers. Contact ^^^^^^^ Please email any questions to: cristian@cs.cornell.edu (Cristian Danescu-Niculescu-Mizil)