VectorClassifier¶
Example usage: bag-of-words classification.
-
class
convokit.classifier.vectorClassifier.
VectorClassifier
(obj_type: str, vector_name: str, columns: List[str] = None, labeller: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, clf=None, clf_attribute_name: str = 'prediction', clf_prob_attribute_name: str = 'pred_score')¶ Transformer that trains a classifier on the Corpus components’ text vector representation (e.g. bag-of-words, TF-IDF, etc)
Corpus must have a vector with the specified vector_name.
Inherits from Classifier and has access to its methods.
- Parameters
obj_type – “speaker”, “utterance”, or “conversation”
vector_name – the metadata key where the Corpus object text vector is stored
columns – list of column names of vector matrix to use; uses all columns by default.
labeller – a (lambda) function that takes a Corpus object and returns True (y=1) or False (y=0) - i.e. labeller defines the y value of the object for fitting
clf – a sklearn Classifier. By default, clf is a Pipeline with StandardScaler and LogisticRegression
clf_attribute_name – the metadata attribute name to store the classifier prediction value under; default: “prediction”
clf_prob_attribute_name – the metadata attribute name to store the classifier prediction score under; default: “pred_score”
-
accuracy
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Calculate the accuracy of the classification
- Parameters
corpus – target Corpus
selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
- Returns
float value
-
base_accuracy
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get the base accuracy, i.e. the maximum of the percentages of results that are y=1 and y=0
- Parameters
corpus – the classified Corpus
selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
- Returns
float value
-
classification_report
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate classification report for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
- Parameters
corpus – target Corpus
selector – (lambda) function selecting objects to include in this classification report
- Returns
classification report
-
confusion_matrix
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate confusion matrix for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
- Parameters
corpus – target Corpus
selector – (lambda) function selecting objects to include in this confusion_matrix; uses all objects by default
- Returns
sklearn confusion matrix
-
evaluate_with_cv
(corpus: convokit.model.corpus.Corpus, cv=sklearn.model_selection.KFold, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>)¶ Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using cross-validation for data splitting.
- Parameters
corpus – target Corpus
cv – cross-validation model to use: KFold(n_splits=5, shuffle=True) by default.
selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
cross-validated accuracy score
-
evaluate_with_train_test_split
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, test_size: float = 0.2)¶ Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using a train-test split.
Run either on a Corpus (with Classifier labeller, selector, obj_type settings) or a list of Corpus objects
- Parameters
corpus – target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
test_size – size of test set
- Returns
accuracy and confusion matrix
-
fit
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, y=None)¶ Fit the Transformer’s internal classifier model on the vector matrix that represents one of the Corpus components, with an optional selector that selects for objects to be fit on.
- Parameters
corpus – the target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
the fitted VectorClassifier
-
fit_transform
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>) → convokit.model.corpus.Corpus¶ Runs the fit() and transform() steps in order, with the specified selector.
- Parameters
corpus – the target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
the target Corpus annotated
-
get_coefs
(feature_names: List[str], coef_func=None)¶ Get dataframe of classifier coefficients
- Parameters
feature_names – list of feature names to get coefficients for
coef_func – function for accessing the list of coefficients from the classifier model; by default, assumes it is a pipeline with a logistic regression component
- Returns
DataFrame of features and coefficients, indexed by feature names
-
get_model
()¶ Gets the Classifier’s internal model
-
get_y_true_pred
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get lists of true and predicted labels
- Parameters
corpus – target Corpus
selector – (lambda) function selecting objects to get labels for; uses all objects by default
- Returns
list of true labels, and list of predicted labels
-
set_model
(clf)¶ Sets the Classifier’s internal model
-
summarize
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>)¶ Generate a DataFrame indexed by object id with the classifier predictions and scores.
- Parameters
corpus – the annotated Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
a pandas DataFrame
-
summarize_objs
(objs: List[convokit.model.corpusComponent.CorpusComponent])¶ Not implemented for VectorClassifier.
-
transform
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>) → convokit.model.corpus.Corpus¶ Annotate the corpus components with the classifier prediction and prediction score, with an optional selector that selects for objects to be classified. Objects that are not selected will get a metadata value of ‘None’ instead of the classifier prediction.
- Parameters
corpus – the target Corpus
selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- Returns
the target Corpus annotated
-
transform_objs
(objs: List[convokit.model.corpusComponent.CorpusComponent]) → List[convokit.model.corpusComponent.CorpusComponent]¶ Not implemented for VectorClassifier.