Persuasion For Good Corpus
===========================

A collection of online conversations generated by Amazon Mechanical Turk workers, where one participant (the *persuader*) tries to convince the other (the *persuadee*) to donate to a charity. This dataset contains 1017 conversations, along with demographic data and responses to psychological surveys from users. 300 conversations also have per-sentence human annotations of dialogue acts that pertain to the persuasion setting, and sentiment.

This is a Convokit-formatted version of the dataset originally distributed with the following paper (`link <https://www.aclweb.org/anthology/P19-1566/>`_, `dataset link <https://gitlab.com/ucdavisnlp/persuasionforgood/tree/master/data>`_):
Wang, Xuewei, Weiyan Shi, Richard Kim, Yoojung Oh, Sijia Yang, Jingwen Zhang, and Zhou Yu. "Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good." Proceedings of ACL, 2019.

Dataset details
---------------

Speaker-level information
^^^^^^^^^^^^^^^^^^^^^^^^^

There are 1285 users in this dataset. Speakers are workers on Amazon Mechanical Turk. Each user has associated metadata from pre-task psychological surveys, as well as demographic information:

**Demographics**

* age (numerical; note that the age 3 is probably a typo)
* sex (categorical, in {Male, Female, Other})
* race (categorical, in {White, Other})
* edu (categorical, in {Less than four-year college, Four year college, Postgraduate})
* marital (categorical, in {Married, Unmarried})
* employment (categorical, in {Employed for wages, Other})
* income (numerical)
* religion (categorical, in {Atheist, Catholic, Protestant, Other religion})
* ideology (categorical, in {Conservative, Liberal, Moderate})

**Big-Five Personality Traits**

These are (continuous) numbers in [1,5].

* extrovert
* agreeable
* conscientious
* neurotic
* open

**Moral Foundations**

These are numbers in [1,6].

* care
* fairness
* loyalty
* authority
* purity
* freedom

**Schwartz Portrait Values**

These are numbers in [1,6].

* conform
* tradition
* benevolence
* universalism
* self_direction
* stimulation
* hedonism
* achievement
* power
* security

**Decision-Making Style**

These are numbers in [1,5].

* rational
* intuitive

Utterance-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each utterance corresponds to a turn in a dialogue.

* id: index of the utterance
* speaker: the author of the utterance
* conversation_id: id of the first utterance in the dialogue this utterance belongs to
* reply_to: the id of the utterance to which this utterance is a reply (None if the utterance starts the conversation)
* text: content of the utterance

Additional metadata includes:

* role: whether the utterance's author is the persuader (0) or persuadee (1)
* user_turn_id: i, such that this utterance is the particular user's ith turn in the conversation

In addition, for 6,136 utterances in 300 human-annotated conversations, the following information is provided:

* label_1: the dialogue acts in each sentence of the utterance, stored as a list. If the utterance is authored by a persuader, these are persuasion strategies, otherwise these are dialogue acts particular to the persuadee's role. np.nan if the utterance is not annotated.
* label_2: the second dialogue act in each sentence (available for a limited number of utterances). np.nan if the utterance is not annotated.
* sentiment: the sentiment score for each sentence, stored as a dict of lists, where entries correspond to pos, neg and neutral sentiment. np.nan for each sentiment category if the utterance is not annotated.
* n_sents: the number of sentences in the utterance. None if the utterance is not annotated.
* text_by_sent: a string containing the utterance's text, where <s> denotes sentence breaks. np.nan if the utterance is not annotated.

Conversation-level information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Each conversation contains the following metadata:

* dialogue_id: the ID of the conversation in the original dataset.
* user_ee: the ID of the user who is the persuadee in this conversation.
* user_er: the user ID of the persuader
* donation_ee: the amount donated by the persuadee
* donation_er: the amount donated by the persuader
* is_annotated: whether or not the conversation is manually annotated

Annotated conversations also contain the following metadata:

* intended: the amount that the persuadee intends to donate, as inferred by the annotator. np.nan if the conversation is not annotated.

Usage
-----

To download directly with ConvoKit: 

>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("persuasionforgood-corpus"))

For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 1285
Number of Utterances: 20932
Number of Conversations: 1017

Additional note
---------------

License
^^^^^^^

Licensed under the Apache License 2.0 (license for original dataset found `here <https://gitlab.com/ucdavisnlp/persuasionforgood/blob/master/LICENSE>`_.)


Contact
^^^^^^^

Corpus converted into ConvoKit format by Justine Zhang, with additional work by Frank Li, Grace Deng, Di Ni (fl338@cornell.edu, gd3435@cornell.edu, dn273@cornell.edu).