Persuasion For Good Corpus =========================== A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the *persuader*) tries to convince the other (the *persuadee*) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment. This is a Convokit-formatted version of the dataset originally distributed with the following paper (`link `_, `dataset link `_): Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. "Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good." Proceedings of ACL, 2019. Dataset details --------------- Speaker-level information ^^^^^^^^^^^^^^^^^^^^^^^^^ There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information: **Demographics** * age (numerical; note that the age 3 is probably a typo) * sex (categorical, in {Male, Female, Other}) * race (categorical, in {White, Other}) * edu (categorical, in {Less than four-year college, Four year college, Postgraduate}) * marital (categorical, in {Married, Unmarried}) * employment (categorical, in {Employed for wages, Other}) * income (numerical) * religion (categorical, in {Atheist, Catholic, Protestant, Other religion}) * ideology (categorical, in {Conservative, Liberal, Moderate}) **Big-Five Personality Traits** These are (continuous) numbers in [1,5]. * extrovert * agreeable * conscientious * neurotic * open **Moral Foundations** These are numbers in [1,6]. * care * fairness * loyalty * authority * purity * freedom **Schwartz Portrait Values** These are numbers in [1,6]. * conform * tradition * benevolence * universalism * self_direction * stimulation * hedonism * achievement * power * security **Decision-Making Style** These are numbers in [1,5]. * rational * intuitive Utterance-level information ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each utterance corresponds to a turn in a dialogue. * id: index of the utterance * speaker: the author of the utterance * conversation_id: id of the first utterance in the dialogue this utterance belongs to * reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation) * text: content of the utterance Additional metadata includes: * role: whether the utterance's author is the persuader (0) or persuadee (1) * user_turn_id: i, such that this utterance is the particular user's ith turn in the conversation In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided: * label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee's role. np.nan if the utterance is not annotated. * label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated. * sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated. * n_sents: the number of sentences in the utterance. None if the utterance is not annotated. * text_by_sent: a string containing the utterance's text, where denotes sentence breaks. np.nan if the utterance is not annotated. Conversation-level information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each conversation contains the following metadata: * dialogue_id: the ID of the conversation in the original dataset. * user_ee: the ID of the user who is the persuadee in this conversation. * user_er: the user ID of the persuader * donation_ee: the amount donated by the persuadee * donation_er: the amount donated by the persuader * is_annotated: whether or not the conversation is manually annotated Annotated conversations also contain the following metadata: * intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated. Usage ----- To download directly with ConvoKit: >>> from convokit import Corpus, download >>> corpus = Corpus(filename=download("persuasionforgood-corpus")) For some quick stats: >>> corpus.print_summary_stats() Number of Speakers: 1285 Number of Utterances: 20932 Number of Conversations: 1017 Additional note --------------- License ^^^^^^^ Licensed under the Apache License 2.0 (license for original dataset found `here `_.) Contact ^^^^^^^ Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).