Pairer

The Pairer transformer annotates the Corpus with the pairing information that is needed to run some of the paired prediction analyses (e.g. see the documentation for PairedPrediction and PairedVectorPrediction transformers).

To explain how the Pairer works in more detail, consider the example of the Friends TV series, referenced with paired prediction transformers. We are interested in examining how differently Rachel talks to Monica and Chandler. When considering all utterances by Rachel to Monica and Chandler in the comparative analysis, the differences we observe may inadvertently be due to different topics of conversations. Thus, in order to control for the variable context of conversations, one might want to focus on utterances from conversations in which the three — Rachel, Monica, and Chandler — are all present. More precisely, we would like to pair Rachel’s utterances directed to Monica with utterances directed to Chandler if they are part of the same conversation.

Next, we are going to show how we can set up this pairing from the example using Pairer transformer:

  • The obj_type is “utterance”, since we compare Rachel’s utterances

  • The pairing_func is supposed to extract the identifier that would identify the object as part of the pair. In this case, that would be the Utterance’s conversation id since we want utterances from the same conversation.

  • We need to distinguish between utterances where Rachel speaks to Monica vs. Chandler. The pos_label_func and neg_label_func is how we can specify this (e.g. lambda utt: utt.meta['target’]), where positive instances might be arbitrarily refer to targetting Monica, and negative for targetting Chandler.

  • pair_mode denotes how many pairs to use per context. For example, a Conversation will likely have Rachel address Monica and Chandler each multiple times. This means that there are multiple positive and negative instances that can be used to form pairs. We could randomly pick one pair of instances (“random”), or the first pair of instances (“first”), or the maximum pairs of instances (“maximize”).

Pairer saves this pairing information into the object metadata.

  • pair_id is the “id” that uniquely identifies a pair of positive and negative instances, and is the output from the pairing_func.

  • pair_obj_label denotes whether the object is the positive or negative instance of the pair

  • pair_orientation denotes whether to use the pair itself as a positive or negative data point in a predictive classifier. “pos” means the difference between the objects in the pair should be computed as [+ve obj features] - [-ve obj features], and “neg” means it should be computed as [-ve obj features] - [+ve obj features].

class convokit.paired_prediction.pairer.Pairer(obj_type: str, pairing_func: Callable[[convokit.model.corpusComponent.CorpusComponent], str], pos_label_func: Callable[[convokit.model.corpusComponent.CorpusComponent], bool], neg_label_func: Callable[[convokit.model.corpusComponent.CorpusComponent], bool], pair_mode: str = 'random', pair_id_attribute_name: str = 'pair_id', label_attribute_name: str = 'pair_obj_label', pair_orientation_attribute_name: str = 'pair_orientation')

Pairer transformer sets up pairing to be used for paired prediction analyses.

Parameters
  • obj_type – type of Corpus object to classify: ‘conversation’, ‘speaker’, or ‘utterance’

  • pairing_func – the Corpus object characteristic to pair on, e.g. to pair on the first 10 characters of a well-structured id, use lambda obj: obj.id[:10]

  • pos_label_func – the function to check if the object is a positive instance

  • neg_label_func – the function to check if the object is a negative instance

  • pair_mode – ‘random’: pick a single positive and negative object pair randomly (default), ‘maximize’: pick the maximum number of positive and negative object pairs possible randomly, or ‘first’: pick the first positive and negative object pair found.

  • pair_id_attribute_name – metadata attribute name to use in annotating object with pair id, default: “pair_id”. The value is determined by the output of pairing_func. If pair_mode is ‘maximize’, the value is the output of pairing_func + “_[i]”, where i is the ith pair extracted from a given context.

  • label_attribute_name – metadata attribute name to use in annotating object with whether it is positive or negative, default: “pair_obj_label”

  • pair_orientation_attribute_name – metadata attribute name to use in annotating object with pair orientation, default: “pair_orientation”

transform(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Pairer.<lambda>>) → convokit.model.corpus.Corpus

Annotate corpus objects with pair information (label, pair_id, pair_orientation), with an optional selector indicating which objects should be considered for pairing.

Parameters
  • corpus – target Corpus

  • selector – a (lambda) function that takes a Corpus object and returns a bool (True = include)

Returns

annotated Corpus