Abstract base class for modules that take in a Corpus and modify the Corpus and/or extend it with additional information, imitating the scikit-learn Transformer API. Exposes
fit()performs any necessary precomputation (or “training” in machine learning parlance) while
transform()does the work of actually computing the modification and applying it to the corpus.
All subclasses must implement
transform(); subclasses that require precomputation should also override
fit(), which by default does nothing. Additionally, the interface also exposes a
fit_transform()method that does both steps on the same Corpus in one line. By default this is implemented to simply call
transform(), but designers of Transformer subclasses may also choose to overwrite the default implementation in cases where the combined operation can be implemented more efficiently than doing the steps separately.
fit(corpus: convokit.model.corpus.Corpus, y=None, **kwargs)¶
Use the provided Corpus to perform any precomputations necessary to later perform the actual transformation step.
Parameters: corpus – the Corpus to use for fitting Returns: the fitted Transformer
fit_transform(corpus: convokit.model.corpus.Corpus, y=None, **kwargs) → convokit.model.corpus.Corpus¶
Fit and run the Transformer on a single Corpus.
Parameters: corpus – the Corpus to use Returns: same as transform
transform(corpus: convokit.model.corpus.Corpus, **kwargs) → convokit.model.corpus.Corpus¶
Modify the provided corpus. This is an abstract method that must be implemented by any Transformer subclass
Parameters: corpus – the Corpus to transform Returns: modified version of the input Corpus. Note that unlike the scikit-learn equivalent,
transform()operates inplace on the Corpus (though for convenience and compatibility with scikit-learn, it also returns the modified Corpus).