Persuasion For Good Corpus

A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the persuader) tries to convince the other (the persuadee) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment.

This is a Convokit-formatted version of the dataset originally distributed with the following paper (link, dataset link): Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. “Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good.” Proceedings of ACL, 2019.

Dataset details

Speaker-level information

There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information:

Demographics

  • age (numerical; note that the age 3 is probably a typo)
  • sex (categorical, in {Male, Female, Other})
  • race (categorical, in {White, Other})
  • edu (categorical, in {Less than four-year college, Four year college, Postgraduate})
  • marital (categorical, in {Married, Unmarried})
  • employment (categorical, in {Employed for wages, Other})
  • income (numerical)
  • religion (categorical, in {Atheist, Catholic, Protestant, Other religion})
  • ideology (categorical, in {Conservative, Liberal, Moderate})

Big-Five Personality Traits

These are (continuous) numbers in [1,5].

  • extrovert
  • agreeable
  • conscientious
  • neurotic
  • open

Moral Foundations

These are numbers in [1,6].

  • care
  • fairness
  • loyalty
  • authority
  • purity
  • freedom

Schwartz Portrait Values

These are numbers in [1,6].

  • conform
  • tradition
  • benevolence
  • universalism
  • self_direction
  • stimulation
  • hedonism
  • achievement
  • power
  • security

Decision-Making Style

These are numbers in [1,5].

  • rational
  • intuitive

Utterance-level information

Each utterance corresponds to a turn in a dialogue.

  • id: index of the utterance
  • speaker: the author of the utterance
  • conversation_id: id of the first utterance in the dialogue this utterance belongs to
  • reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation)
  • text: content of the utterance

Additional metadata includes:

  • role: whether the utterance’s author is the persuader (0) or persuadee (1)
  • user_turn_id: i, such that this utterance is the particular user’s ith turn in the conversation

In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided:

  • label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee’s role. np.nan if the utterance is not annotated.
  • label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated.
  • sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated.
  • n_sents: the number of sentences in the utterance. None if the utterance is not annotated.
  • text_by_sent: a string containing the utterance’s text, where <s> denotes sentence breaks. np.nan if the utterance is not annotated.

Conversation-level information

Each conversation contains the following metadata:

  • dialogue_id: the ID of the conversation in the original dataset.
  • user_ee: the ID of the user who is the persuadee in this conversation.
  • user_er: the user ID of the persuader
  • donation_ee: the amount donated by the persuadee
  • donation_er: the amount donated by the persuader
  • is_annotated: whether or not the conversation is manually annotated

Annotated conversations also contain the following metadata:

  • intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated.

Usage

To download directly with ConvoKit:

>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("persuasionforgood-corpus"))

For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 1285
Number of Utterances: 20932
Number of Conversations: 1017

Additional note

License

Licensed under the Apache License 2.0 (license for original dataset found here.)

Contact

Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).