Paired Prediction¶

At a high level, Paired Prediction is a quasi-experimental method that controls for certain priors, see Cheng et al. 2014 for an illustrated example of PairedPrediction in research.

As an illustrative example, consider the Friends TV series, where we might want to examine how Rachel talks to Monica and Chandler differently. At one level, we might just look at the differences in the utterances where Rachel speaks to Monica and Rachel speaks to Chandler. But this inadvertently surfaces differences that might arise from Rachel interacting with Monica and Chandler separately in different settings and scenarios, and thus highlight only uninteresting differences in topics discussed.

Instead, we might want to look for subtler differences in speech, controlling for topic perhaps. One way we might to do this to look only at Conversations where Rachel, Monica, and Chandler are all present. We would then compare utterances where Rachel speaks to Monica and Rachel speaks to Chandler within that Conversation and look for differences between these paired sets of utterances.

Documentation for the two transformers that do paired prediction task is presented below. PairedPrediction transformer uses corpus object’s metadata features for predictions, while PairedVectorPrediction transformer utilizes vector data associated with the object. Also, see the documentation for Pairer transformer, which sets up the pairs needed in paired prediction analysis.

Example usage: Using Hyperconvo features to predict conversation growth on Reddit in a paired setting

class convokit.paired_prediction.pairedPrediction.PairedPrediction(obj_type: str, pred_feats: List[str], clf=None, pair_id_attribute_name: str = 'pair_id', label_attribute_name: str = 'pair_obj_label', pair_orientation_attribute_name: str = 'pair_orientation')¶

At a high level, Paired Prediction is a quasi-experimental method that controls for certain priors, see Cheng et al. 2014 for an illustrated example of PairedPrediction in research. (https://cs.stanford.edu/people/jure/pubs/disqus-icwsm14.pdf)

See Pairer’s documentation for more information about pairing.

Parameters

pred_feats – list of metadata attributes (i.e. predictive features) to be used in prediction. Features can either be values or a dictionary of key-value pairs.
clf – optional classifier to be used in the paired prediction
pair_id_attribute_name – metadata attribute name to use in annotating object with pair id, default: “pair_id”
label_attribute_name – metadata attribute name to use in annotating object with predicted label, default: “label”
pair_orientation_attribute_name – metadata attribute name to use in annotating object with pair orientation, default: “pair_orientation”

fit(corpus: convokit.model.corpus.Corpus, y=None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function PairedPrediction.<lambda>>)¶

Fit the internal classifier on the paired object features, with an optional selector selecting for which corpus objects to include in the analysis

Parameters

corpus – target Corpus
selector – a (lambda) function that takes a Corpus object and returns a bool: True if the object is to be included in the paired prediction. By default, includes all objects.

Returns

fitted PairedPrediction Transformer

get_coefs(feature_names: List[str], coef_func=None)¶

Get dataframe of classifier coefficients.

Parameters

feature_names – list of feature names to get coefficients for
coef_func – function for accessing the list of coefficients from the classifier model; by default, assumes it is a pipeline with a logistic regression component

Returns

DataFrame of features and coefficients, indexed by feature names

summarize(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function PairedPrediction.<lambda>>, cv=sklearn.model_selection.KFold)¶

Run PairedPrediction on the corpus with cross-validation and returns the mean cross-validation score.

Parameters

corpus – target Corpus (must be annotated with pair information using PairedPrediction.transform())
selector – a (lambda) function that takes a Corpus object and returns a bool: True if the object is to be included in summary. By default, includes all objects.
cv – optional CV model: default is KFold(n_splits=5, shuffle=True)

Returns

cross-validation accuracy score

transform(corpus: convokit.model.corpus.Corpus) → convokit.model.corpus.Corpus¶: PairedPrediction does not add any annotations to the Corpus.

class convokit.paired_prediction.pairedVectorPrediction.PairedVectorPrediction(obj_type: str, vector_name: str, clf=None, pair_id_attribute_name: str = 'pair_id', label_attribute_name: str = 'pair_obj_label', pair_orientation_attribute_name: str = 'pair_orientation')¶

Transformer for doing a Paired Prediction with vectors.

Parameters

obj_type – corpus component type being used for analysis: ‘utterance’, ‘speaker’, or ‘conversation’
vector_name – name of the vector matrix containing the bag-of-words vectors
clf – classifier to be used in the paired prediction; by default: standard-scaled logistic regression
pair_id_attribute_name – metadata attribute name to use in annotating object with pair id, default: “pair_id”
label_attribute_name – metadata attribute name to use in annotating object with predicted label, default: “label”
pair_orientation_attribute_name – metadata attribute name to use in annotating object with pair orientation, default: “pair_orientation”

fit(corpus: convokit.model.corpus.Corpus, y=None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function PairedVectorPrediction.<lambda>>)¶

Fit the internal classifier to the Corpus component objects.

Parameters

corpus – the target Corpus
selector – selector (lambda) function for which objects should be included in the analysis

Returns

this Transformer object with a fitted internal classifier

get_coefs(feature_names: List[str], coef_func=None)¶

Get dataframe of classifier coefficients. By default, assumes it is a pipeline with a logistic regression component. For other setups, the user should define a custom coef_func.

Parameters

feature_names – list of feature names to get coefficients for
coef_func – (optional) function for accessing the list of coefficients from the classifier model

Returns

DataFrame of features and coefficients, indexed by feature names

summarize(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function PairedVectorPrediction.<lambda>>, cv=sklearn.model_selection.KFold)¶

Run PairedPrediction on the corpus with cross-validation.

Parameters

corpus – annoted Corpus (with pair information from PairedPrediction.transform())
selector – selector (lambda) function for which objects should be included in the analysis
cv – optional CV model: default is KFold(n_splits=5, shuffle=True)

Returns

cross-validation accuracy score