Supreme Court Corpus

A collection of conversations from the U.S. Supreme Court Oral Arguments (51,498 utterances, from 204 cases).

Distributed together with: Echoes of power: Language effects and power differences in social interaction. Cristian Danescu-Niculescu-Mizil, Bo Pang, Lillian Lee and Jon Kleinberg. WWW 2012

Dataset details

Speaker-level information

For each speaker, additional information include:

  • is-justice: whether the speaker is a Justice

  • gender: gender of the speaker

Utterance-level information

For each utterance, we provide:

  • id: index of the utterance

  • speaker: the speaker who author the utterance

  • conversation_id: id of the first utterance in the conversation this utterance belongs to

  • reply_to: id of the utterance to which this utterance replies to (None if the utterance is not a reply)

  • timestamp: time of the utterance

  • text: textual content of the utterance

Metadata for utterances may include:

  • case: case number

  • justice-is-favorable: true if the Justice eventually vote for this side

  • justice-vote: eventual vote from the Justice

  • side: side of the case

Note that some utterances may have only a subset of such information.


To download directly with ConvoKit:

>>> from convokit import Corpus, download
>>> corpus = Corpus(filename=download("supreme-corpus"))

For some quick stats:

>>> corpus.print_summary_stats()
Number of Speakers: 324
Number of Utterances: 51498
Number of Conversations: 938

Additional note


Please email any questions to: (Cristian Danescu-Niculescu-Mizil)