Speaker Conversation Utilities¶

Various helpers regarding aggregated speaker,conversation-level attributes.

Example usage: speaker conversation attributes

class convokit.speaker_convo_helpers.speaker_convo_attrs.SpeakerConvoAttrs(attr_name, output_field=None, agg_fn=None, recompute=False)¶

Transformer that aggregates statistics per (speaker, convo). e.g., average wordcount of all utterances that speaker contributed per convo. Assumes that corpus.organize_speaker_convo_history has already been called.

Parameters

attr_name – name of attribute to aggregate over. note that this attribute must already exist as an annotation to utterances in the corpus.
output_field – name of the aggregated attribute to output. defaults to attr_name.
agg_fn – function to aggregate utterance-level attribute with. defaults to returning a list.
recompute – if False, will not recompute the aggregate if output_field already exists for a speaker convo entry.

transform(corpus: convokit.model.corpus.Corpus)¶

creates and populates speaker, convo aggregates.

Parameters: corpus (Corpus) – the Corpus to transform.

class convokit.speaker_convo_helpers.speaker_convo_lifestage.SpeakerConvoLifestage(lifestage_size, output_field='lifestage')¶

Transformer that, for each speaker in a conversation, computes the lifestage of the speaker in that conversation. For instance, if lifestages are 20 conversations long, then the first 20 conversations a speaker participates in will be in lifestage 0, and the second 20 will be in lifestage 1.

Assumes that corpus.organize_speaker_convo_history has already been called.

Parameters

lifestage_size – size of the lifestage
output_field – name of speaker conversation attribute to output, defaults to “lifestage”

transform(corpus)¶

Modify the provided corpus. This is an abstract method that must be implemented by any Transformer subclass

Parameters: corpus – the Corpus to transform
Returns: modified version of the input Corpus. Note that unlike the scikit-learn equivalent, transform() operates inplace on the Corpus (though for convenience and compatibility with scikit-learn, it also returns the modified Corpus).