Speaker Conversation Utilities

Various helpers regarding aggregated speaker,conversation-level attributes.

Example usage: speaker conversation attributes

class convokit.speaker_convo_helpers.speaker_convo_attrs.SpeakerConvoAttrs(attr_name, output_field=None, agg_fn=None, recompute=False)

Transformer that aggregates statistics per (speaker, convo). e.g., average wordcount of all utterances that speaker contributed per convo. Assumes that corpus.organize_speaker_convo_history has already been called.

  • attr_name – name of attribute to aggregate over. note that this attribute must already exist as an annotation to utterances in the corpus.

  • output_field – name of the aggregated attribute to output. defaults to attr_name.

  • agg_fn – function to aggregate utterance-level attribute with. defaults to returning a list.

  • recompute – if False, will not recompute the aggregate if output_field already exists for a speaker convo entry.

transform(corpus: convokit.model.corpus.Corpus)

creates and populates speaker, convo aggregates.


corpus (Corpus) – the Corpus to transform.

class convokit.speaker_convo_helpers.speaker_convo_lifestage.SpeakerConvoLifestage(lifestage_size, output_field='lifestage')

Transformer that, for each speaker in a conversation, computes the lifestage of the speaker in that conversation. For instance, if lifestages are 20 conversations long, then the first 20 conversations a speaker participates in will be in lifestage 0, and the second 20 will be in lifestage 1.

Assumes that corpus.organize_speaker_convo_history has already been called.

  • lifestage_size – size of the lifestage

  • output_field – name of speaker conversation attribute to output, defaults to “lifestage”


Modify the provided corpus. This is an abstract method that must be implemented by any Transformer subclass


corpus – the Corpus to transform


modified version of the input Corpus. Note that unlike the scikit-learn equivalent, transform() operates inplace on the Corpus (though for convenience and compatibility with scikit-learn, it also returns the modified Corpus).