TextProcessor

class convokit.text_processing.textProcessor.TextProcessor(proc_fn, output_field, input_field=None, aux_input=None, input_filter=None, verbosity=0)

A base class for Transformers that perform per-utterance computations, i.e., computing utterance-by-utterance features or representations.

Parameters
  • proc_fn – function to compute per utterance. Supports one of two function signatures: proc_fn(input) and proc_fn(input, auxiliary_info).

  • input_field – If set to a string, the attribute of the utterance that proc_fn will take as input. If set to None, will default to reading utt.text. If set to a list of attributes, proc_fn will expect a dict of {attribute name: attribute value}.

  • output_field – If set to a string, the name of the attribute that the output of proc_fn will be written to. If set to a list, proc_fn will return a tuple where each entry in the tuple corresponds to a field in the list.

  • aux_input – any auxiliary input that proc_fn needs (e.g., a pre-loaded model); passed in as a dict.

  • input_filter – a boolean function of signature input_filter(utterance, aux_input). attributes will only be computed for utterances where input_filter returns True. By default, will always return True, meaning that attributes will be computed for all utterances.

  • verbosity – frequency at which to print status messages when computing attributes.

transform(corpus: convokit.model.corpus.Corpus) → convokit.model.corpus.Corpus

Computes per-utterance attributes for each utterance in the Corpus, storing these values in the output_field of each utterance as specified in the constructor. For utterances which do not contain all of the input_field attributes as specified in the constructor, or for utterances which return False on input_filter, this call will not annotate the utterance.

Parameters

corpus – Corpus

Returns

the corpus

transform_utterance(utt, override_input_filter=False)

Computes per-utterance attributes of an individual utterance or string. For utterances which do not contain all of the input_field attributes as specified in the constructor, or for utterances which return False on input_filter, this call will not annotate the utterance. For strings, will convert the string to an utterance and return the utterance, annotating it if input_field is not set to None at initialization.

Parameters
  • utt – utterance or a string

  • override_input_filter – ignore input_filter and compute attribute for all utterances

Returns

the utterance