Basic

Implements a few basic transformers.

class convokit.basic.tokenizer.Tokenizer(verbosity: int = 0)

tokenizes utterances. stores tokens as space-separated string.

Parameters

verbosity – frequency to print status messages while tokenizing.

transform(corpus: convokit.model.corpus.Corpus)

tokenizes each utterance, and stores tokens as a space-separated string entry in the utterance metadata.

Parameters

corpus (Corpus) – the Corpus to tokenize utterances for.

class convokit.basic.wordcount.WordCount(use_tokenized=True)

computes wordcount per utterance.

Parameters

use_tokenized – whether or not to use nltk-tokenized output (requires tokenizer to be run)

transform(corpus: convokit.model.corpus.Corpus)

computes wordcount per utterance

::param corpus: the Corpus to compute wordcount for. :type corpus: Corpus