Cumulative Bag-of-Words Model

class convokit.forecaster.cumulativeBoW.CumulativeBoW(vectorizer=None, clf_model=None, use_tokens=False, forecast_attribute_name: str = 'prediction', forecast_prob_attribute_name: str = 'score')

A cumulative bag-of-words forecasting model.

Parameters
  • vectorizer – optional vectorizer; default CV (min_df=10, max_df=0.5, ngram_range=(1,1), max_features=15000)

  • clf_model – optional classifier model; default standard-scaled logistic regression

  • use_tokens – if using default vectorizer, set this to true if input is already tokenized

  • forecast_attribute_name – name for DataFrame column containing predictions, default: “prediction”

  • forecast_prob_attribute_name – name for column containing prediction scores, default: “score”

forecast(id_to_context_reply_label)

Use the Forecaster Model to compute forecasts and scores for given context-reply pairs and return a results dataframe

train(id_to_context_reply_label)

Train the Forecaster Model with the context-reply-label tuples