VectorClassifier¶
Example usage: bag-of-words classification.
-
class
convokit.classifier.vectorClassifier.
VectorClassifier
(obj_type: str, vector_name: str, columns: List[str] = None, labeller: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, clf=None, clf_attribute_name: str = 'prediction', clf_prob_attribute_name: str = 'pred_score')¶ Transformer that trains a classifier on the Corpus components’ text vector representation (e.g. bag-of-words, TF-IDF, etc)
Corpus must have a vector with the specified vector_name.
Inherits from Classifier and has access to its methods.
Parameters: - obj_type – “speaker”, “utterance”, or “conversation”
- vector_name – the metadata key where the Corpus object text vector is stored
- columns – list of column names of vector matrix to use; uses all columns by default.
- labeller – a (lambda) function that takes a Corpus object and returns True (y=1) or False (y=0) - i.e. labeller defines the y value of the object for fitting
- clf – a sklearn Classifier. By default, clf is a Pipeline with StandardScaler and LogisticRegression
- clf_attribute_name – the metadata attribute name to store the classifier prediction value under; default: “prediction”
- clf_prob_attribute_name – the metadata attribute name to store the classifier prediction score under; default: “pred_score”
-
accuracy
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Calculate the accuracy of the classification
Parameters: - corpus – target Corpus
- selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
Returns: float value
-
base_accuracy
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get the base accuracy, i.e. the maximum of the percentages of results that are y=1 and y=0
Parameters: - corpus – the classified Corpus
- selector – (lambda) function selecting objects to include in this accuracy calculation; uses all objects by default
Returns: float value
-
classification_report
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate classification report for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
Parameters: - corpus – target Corpus
- selector – (lambda) function selecting objects to include in this classification report
Returns: classification report
-
confusion_matrix
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Generate confusion matrix for transformed corpus using labeller for y_true and clf_attribute_name as y_pred
Parameters: - corpus – target Corpus
- selector – (lambda) function selecting objects to include in this confusion_matrix; uses all objects by default
Returns: sklearn confusion matrix
-
evaluate_with_cv
(corpus: convokit.model.corpus.Corpus, cv=KFold(n_splits=5, random_state=None, shuffle=True), selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>)¶ Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using cross-validation for data splitting.
Parameters: - corpus – target Corpus
- cv – cross-validation model to use: KFold(n_splits=5, shuffle=True) by default.
- selector – if running on a Corpus, this is a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns: cross-validated accuracy score
-
evaluate_with_train_test_split
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, test_size: float = 0.2)¶ Evaluate the performance of predictive features (Classifier.pred_feats) in predicting for the label, using a train-test split.
Run either on a Corpus (with Classifier labeller, selector, obj_type settings) or a list of Corpus objects
Parameters: - corpus – target Corpus
- selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
- test_size – size of test set
Returns: accuracy and confusion matrix
-
fit
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>, y=None)¶ Fit the Transformer’s internal classifier model on the vector matrix that represents one of the Corpus components, with an optional selector that selects for objects to be fit on.
Parameters: - corpus – the target Corpus
- selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns: the fitted VectorClassifier
-
fit_transform
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>) → convokit.model.corpus.Corpus¶ Runs the fit() and transform() steps in order, with the specified selector.
Parameters: - corpus – the target Corpus
- selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns: the target Corpus annotated
-
get_coefs
(feature_names: List[str], coef_func=None)¶ Get dataframe of classifier coefficients
Parameters: - feature_names – list of feature names to get coefficients for
- coef_func – function for accessing the list of coefficients from the classifier model; by default, assumes it is a pipeline with a logistic regression component
Returns: DataFrame of features and coefficients, indexed by feature names
-
get_model
()¶ Gets the Classifier’s internal model
-
get_y_true_pred
(corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function Classifier.<lambda>>)¶ Get lists of true and predicted labels
Parameters: - corpus – target Corpus
- selector – (lambda) function selecting objects to get labels for; uses all objects by default
Returns: list of true labels, and list of predicted labels
-
set_model
(clf)¶ Sets the Classifier’s internal model
-
summarize
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>)¶ Generate a DataFrame indexed by object id with the classifier predictions and scores.
Parameters: - corpus – the annotated Corpus
- selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns: a pandas DataFrame
-
summarize_objs
(objs: List[convokit.model.corpusComponent.CorpusComponent])¶ Not implemented for VectorClassifier.
-
transform
(corpus: convokit.model.corpus.Corpus, selector: Callable[[convokit.model.corpusComponent.CorpusComponent], bool] = <function VectorClassifier.<lambda>>) → convokit.model.corpus.Corpus¶ Annotate the corpus components with the classifier prediction and prediction score, with an optional selector that selects for objects to be classified. Objects that are not selected will get a metadata value of ‘None’ instead of the classifier prediction.
Parameters: - corpus – the target Corpus
- selector – a (lambda) function that takes a Corpus object and returns True or False (i.e. include / exclude). By default, the selector includes all objects of the specified type in the Corpus.
Returns: the target Corpus annotated
-
transform_objs
(objs: List[convokit.model.corpusComponent.CorpusComponent]) → List[convokit.model.corpusComponent.CorpusComponent]¶ Not implemented for VectorClassifier.