Persuasion For Good Corpus¶
A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the persuader) tries to convince the other (the persuadee) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment.
This is a Convokit-formatted version of the dataset originally distributed with the following paper (link, dataset link): Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. “Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good.” Proceedings of ACL, 2019.
Dataset details¶
Speaker-level information¶
There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information:
Demographics
age (numerical; note that the age 3 is probably a typo)
sex (categorical, in {Male, Female, Other})
race (categorical, in {White, Other})
edu (categorical, in {Less than four-year college, Four year college, Postgraduate})
marital (categorical, in {Married, Unmarried})
employment (categorical, in {Employed for wages, Other})
income (numerical)
religion (categorical, in {Atheist, Catholic, Protestant, Other religion})
ideology (categorical, in {Conservative, Liberal, Moderate})
Big-Five Personality Traits
These are (continuous) numbers in [1,5].
extrovert
agreeable
conscientious
neurotic
open
Moral Foundations
These are numbers in [1,6].
care
fairness
loyalty
authority
purity
freedom
Schwartz Portrait Values
These are numbers in [1,6].
conform
tradition
benevolence
universalism
self_direction
stimulation
hedonism
achievement
power
security
Decision-Making Style
These are numbers in [1,5].
rational
intuitive
Utterance-level information¶
Each utterance corresponds to a turn in a dialogue.
id: index of the utterance
speaker: the author of the utterance
conversation_id: id of the first utterance in the dialogue this utterance belongs to
reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation)
text: content of the utterance
Additional metadata includes:
role: whether the utterance’s author is the persuader (0) or persuadee (1)
user_turn_id: i, such that this utterance is the particular user’s ith turn in the conversation
In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided:
label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee’s role. np.nan if the utterance is not annotated.
label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated.
sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated.
n_sents: the number of sentences in the utterance. None if the utterance is not annotated.
text_by_sent: a string containing the utterance’s text, where <s> denotes sentence breaks. np.nan if the utterance is not annotated.
Conversation-level information¶
Each conversation contains the following metadata:
dialogue_id: the ID of the conversation in the original dataset.
user_ee: the ID of the user who is the persuadee in this conversation.
user_er: the user ID of the persuader
donation_ee: the amount donated by the persuadee
donation_er: the amount donated by the persuader
is_annotated: whether or not the conversation is manually annotated
Annotated conversations also contain the following metadata:
intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated.
Usage¶
To download directly with ConvoKit:
>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("persuasionforgood-corpus"))
For some quick stats:
>>> corpus.print_summary_stats()
Number of Speakers: 1285
Number of Utterances: 20932
Number of Conversations: 1017
Additional note¶
Contact¶
Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).