Persuasion For Good Corpus¶

A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the persuader) tries to convince the other (the persuadee) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment.

This is a Convokit-formatted version of the dataset originally distributed with the following paper (link, dataset link): Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. “Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good.” Proceedings of ACL, 2019.

Dataset details¶

Speaker-level information¶

There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information:

Demographics

age (numerical; note that the age 3 is probably a typo)
sex (categorical, in {Male, Female, Other})
race (categorical, in {White, Other})
edu (categorical, in {Less than four-year college, Four year college, Postgraduate})
marital (categorical, in {Married, Unmarried})
employment (categorical, in {Employed for wages, Other})
income (numerical)
religion (categorical, in {Atheist, Catholic, Protestant, Other religion})
ideology (categorical, in {Conservative, Liberal, Moderate})

Big-Five Personality Traits

These are (continuous) numbers in [1,5].

extrovert
agreeable
conscientious
neurotic
open

Moral Foundations

These are numbers in [1,6].

care
fairness
loyalty
authority
purity
freedom

Schwartz Portrait Values

These are numbers in [1,6].

conform
tradition
benevolence
universalism
self_direction
stimulation
hedonism
achievement
power
security

Decision-Making Style

These are numbers in [1,5].

rational
intuitive

Utterance-level information¶

Each utterance corresponds to a turn in a dialogue.

id: index of the utterance
speaker: the author of the utterance
conversation_id: id of the first utterance in the dialogue this utterance belongs to
reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation)
text: content of the utterance

Additional metadata includes:

role: whether the utterance’s author is the persuader (0) or persuadee (1)
user_turn_id: i, such that this utterance is the particular user’s ith turn in the conversation

In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided:

label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee’s role. np.nan if the utterance is not annotated.
label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated.
sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated.
n_sents: the number of sentences in the utterance. None if the utterance is not annotated.
text_by_sent: a string containing the utterance’s text, where <s> denotes sentence breaks. np.nan if the utterance is not annotated.

Conversation-level information¶

Each conversation contains the following metadata:

dialogue_id: the ID of the conversation in the original dataset.
user_ee: the ID of the user who is the persuadee in this conversation.
user_er: the user ID of the persuader
donation_ee: the amount donated by the persuadee
donation_er: the amount donated by the persuader
is_annotated: whether or not the conversation is manually annotated

Annotated conversations also contain the following metadata:

intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated.

Usage¶

To download directly with ConvoKit:

>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("persuasionforgood-corpus"))

For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 1285
Number of Utterances: 20932
Number of Conversations: 1017

Additional note¶

License¶

Licensed under the Apache License 2.0 (license for original dataset found here.)

Contact¶

Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).