Emotional Support Conversation Corpus

A dataset of approximately 1,300 conversations between emotional support seekers and supporters, annotated with strategy labels, emotion types, and survey scores. The dataset explores the emotional support task, which has applications in areas like mental health support and customer service chats.

Dataset details

Speaker-level information

Each conversation involves exactly two roles: a seeker and a supporter. Speakers are bound to their respective conversation and identified accordingly. Speaker metadata include:

  • role: either seeker (the person sharing their problem) or supporter (the person providing emotional support)

  • dialog_index: index of the conversation this speaker is associated with

Utterance-level information

Each utterance corresponds to one turn in a conversation dialog. For each utterance, we provide:

  • id: unique utterance identifier, formatted as utterance_{conversation_id}_{turn_index}

  • speaker: the speaker who authored the utterance

  • conversation_id: ID of the conversation this utterance belongs to

  • reply_to: ID of the previous utterance (None if the utterance is not a reply)

  • timestamp: not provided in the original dataset

  • text: textual content of the utterance

Metadata for each utterance include:

  • annotation: researcher-provided annotation for the utterance, including strategy labels (e.g., Question, Restatement or Paraphrasing, Emotional Support) for supporter turns and optional feedback scores for seeker turns

Conversational-level information

Each conversation corresponds to a single support session. Metadata associated with conversations include:

  • experience_type: the type of personal experience described (e.g., Previous Experience)

  • emotion_type: the primary emotion type expressed by the seeker (e.g., anxiety, depression)

  • problem_type: the category of problem discussed (e.g., job crisis, family)

  • situation: a brief description of the seeker’s situation

  • survey_score: post-conversation survey scores from both parties, including initial and final emotion intensity, empathy, and relevance ratings

  • seeker_question1: open-ended post-conversation response from the seeker (question 1)

  • seeker_question2: open-ended post-conversation response from the seeker (question 2)

  • supporter_question1: open-ended post-conversation response from the supporter (question 1)

  • supporter_question2: open-ended post-conversation response from the supporter (question 2)

Usage

To download directly with ConvoKit:

>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("emotional-support"))

For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 2600
Number of Utterances: 38365
Number of Conversations: 1300

Additional notes

Data License

This dataset is shared under the Creative Commons Attribution-NonCommercial 4.0 International License.

Dataset Access

The original dataset is available here.