Persuasion For Good Corpus
===========================
A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the *persuader*) tries to convince the other (the *persuadee*) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment.
This is a Convokit-formatted version of the dataset originally distributed with the following paper (`link `_, `dataset link `_):
Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. "Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good." Proceedings of ACL, 2019.
Dataset details
---------------
Speaker-level information
^^^^^^^^^^^^^^^^^^^^^^^^^
There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information:
**Demographics**
* age (numerical; note that the age 3 is probably a typo)
* sex (categorical, in {Male, Female, Other})
* race (categorical, in {White, Other})
* edu (categorical, in {Less than four-year college, Four year college, Postgraduate})
* marital (categorical, in {Married, Unmarried})
* employment (categorical, in {Employed for wages, Other})
* income (numerical)
* religion (categorical, in {Atheist, Catholic, Protestant, Other religion})
* ideology (categorical, in {Conservative, Liberal, Moderate})
**Big-Five Personality Traits**
These are (continuous) numbers in [1,5].
* extrovert
* agreeable
* conscientious
* neurotic
* open
**Moral Foundations**
These are numbers in [1,6].
* care
* fairness
* loyalty
* authority
* purity
* freedom
**Schwartz Portrait Values**
These are numbers in [1,6].
* conform
* tradition
* benevolence
* universalism
* self_direction
* stimulation
* hedonism
* achievement
* power
* security
**Decision-Making Style**
These are numbers in [1,5].
* rational
* intuitive
Utterance-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Each utterance corresponds to a turn in a dialogue.
* id: index of the utterance
* speaker: the author of the utterance
* conversation_id: id of the first utterance in the dialogue this utterance belongs to
* reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation)
* text: content of the utterance
Additional metadata includes:
* role: whether the utterance's author is the persuader (0) or persuadee (1)
* user_turn_id: i, such that this utterance is the particular user's ith turn in the conversation
In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided:
* label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee's role. np.nan if the utterance is not annotated.
* label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated.
* sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated.
* n_sents: the number of sentences in the utterance. None if the utterance is not annotated.
* text_by_sent: a string containing the utterance's text, where denotes sentence breaks. np.nan if the utterance is not annotated.
Conversation-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Each conversation contains the following metadata:
* dialogue_id: the ID of the conversation in the original dataset.
* user_ee: the ID of the user who is the persuadee in this conversation.
* user_er: the user ID of the persuader
* donation_ee: the amount donated by the persuadee
* donation_er: the amount donated by the persuader
* is_annotated: whether or not the conversation is manually annotated
Annotated conversations also contain the following metadata:
* intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated.
Usage
-----
To download directly with ConvoKit:
>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("persuasionforgood-corpus"))
For some quick stats:
>>> corpus.print_summary_stats()
Number of Speakers: 1285
Number of Utterances: 20932
Number of Conversations: 1017
Additional note
---------------
License
^^^^^^^
Licensed under the Apache License 2.0 (license for original dataset found `here `_.)
Contact
^^^^^^^
Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).