Classifier

Example usage: politeness classification.

class convokit.classifier.classifier.Classifier(obj_type: str, pred_feats: List[str], labeller: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>, clf=None, clf_attribute_name: str = 'prediction', clf_feat_name=None, clf_prob_attribute_name: str = 'pred_score', clf_prob_feat_name=None)

Transformer that trains a classifier on the specified features of a Corpus’s objects.

Runs on the Corpus’s Speakers, Utterances, or Conversations (as specified by obj_type).

Parameters:
  • obj_type – type of Corpus object to classify: ‘conversation’, ‘speaker’, or ‘utterance’
  • pred_feats – list of metadata attributes containing the features to be used in prediction. If the metadata attribute contains a dictionary, all the keys of the dictionary will be included in pred_feats. Each feature used should have a numeric/boolean type.
  • labeller – a (lambda) function that takes a Corpus object and returns True (y=1) or False (y=0) - i.e. labeller defines the y value of the object for fitting
  • clf – optional sklearn classifier model. By default, clf is a Pipeline with StandardScaler and LogisticRegression.
  • clf_attribute_name – the metadata attribute name to store the classifier prediction value under; default: “prediction”
  • clf_prob_attribute_name – the metadata attribute name to store the classifier prediction score under; default: “pred_score”
accuracy(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Calculate the accuracy of the classification

Parameters:
  • corpus – target Corpus
  • selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
Returns:

float value

base_accuracy(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Get the base accuracy, i.e. the maximum of the percentages of results that are y=1 and y=0

Parameters:
  • corpus – the classified Corpus
  • selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
Returns:

float value

classification_report(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Generate classification report for transformed corpus using labeller for y_true and clf_attribute_name as y_pred

Parameters:
  • corpus – target Corpus
  • selector – (lambda) function selecting objects to include in this classification report
Returns:

classification report

confusion_matrix(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Generate confusion matrix for transformed corpus using labeller for y_true and clf_attribute_name as y_pred

Parameters:
  • corpus – target Corpus
  • selector – (lambda) function selecting objects to include in this confusion_matrix; uses all objects by default
Returns:

sklearn confusion matrix

evaluate_with_cv(corpus: convokit.model.corpus.Corpus = None, objs: List[convokit.model.corpusComponent.CorpusComponent] = None, cv=KFold(n_splits=5, random_state=None, shuffle=True), selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using cross-validation for data splitting.

This method can be run on either a Corpus (passed in as the corpus parameter) or a list of Corpus component objects (passed in as the objs parameter). If run on a Corpus, the cross-validation will be run with the Classifier’s labeller and obj_type settings, and the selector parameter of this function.

Parameters:
  • corpus – target Corpus (do not pass in objs if using this)
  • objs – target list of Corpus objects (do not pass in corpus if using this)
  • cv – cross-validation model to use: KFold(n_splits=5, shuffle=True) by default.
  • selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns:

cross-validated accuracy score

evaluate_with_train_test_split(corpus: convokit.model.corpus.Corpus = None, objs: List[convokit.model.corpusComponent.CorpusComponent] = None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>, test_size: float = 0.2)

Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using a train-test split.

Run either on a Corpus (with Classifier labeller, selector, obj_type settings) or a list of Corpus objects

Parameters:
  • corpus – target Corpus
  • objs – target list of Corpus objects
  • selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
  • test_size – size of test set
Returns:

accuracy and confusion matrix

fit(corpus: convokit.model.corpus.Corpus, y=None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Trains the Transformer’s classifier model, with an optional selector that filters for objects to be fit on.

Parameters:
  • corpus – target Corpus
  • selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns:

the fitted Classifier Transformer

fit_transform(corpus: convokit.model.corpus.Corpus, y=None, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>) → convokit.model.corpus.Corpus

Fit and run the Transformer on a single Corpus.

Parameters:corpus – the Corpus to use
Returns:same as transform
get_coefs(feature_names: List[str], coef_func=None)

Get dataframe of classifier coefficients

Parameters:
  • feature_names – list of feature names to get coefficients for
  • coef_func – function for accessing the list of coefficients from the classifier model; by default, assumes it is a pipeline with a logistic regression component
Returns:

DataFrame of features and coefficients, indexed by feature names

get_model()

Gets the Classifier’s internal model

get_y_true_pred(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Get lists of true and predicted labels

Parameters:
  • corpus – target Corpus
  • selector – (lambda) function selecting objects to get labels for; uses all objects by default
Returns:

list of true labels, and list of predicted labels

set_model(clf)

Sets the Classifier’s internal model

summarize(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)

Generate a pandas DataFrame (indexed by object id, with prediction and prediction score columns) of classification results.

Run either on a target Corpus or a list of Corpus objects

Parameters:
  • corpus – target Corpus
  • selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns:

pandas DataFrame indexed by Corpus object id

summarize_objs(objs: List[convokit.model.corpusComponent.CorpusComponent])

Generate a pandas DataFrame (indexed by object id, with prediction and prediction score columns) of classification results.

Runs on a list of Corpus objects.

Parameters:objs – list of Corpus objects
Returns:pandas DataFrame indexed by Corpus object id
transform(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>) → convokit.model.corpus.Corpus

Run classifier on given corpus’s objects and annotate them with the predictions and prediction scores, with an optional selector that filters for objects to be classified. Objects that are not selected will get a metadata value of ‘None’ instead of the classifier prediction.

Parameters:
  • corpus – target Corpus
  • selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns:

annotated Corpus

transform_objs(objs: List[convokit.model.corpusComponent.CorpusComponent]) → List[convokit.model.corpusComponent.CorpusComponent]

Run classifier on list of Corpus objects and annotate them with the predictions and prediction scores

Parameters:objs – list of Corpus objects
Returns:list of annotated Corpus objects