Conversation

class convokit.model.conversation.Conversation(owner, id: Optional[str] = None, utterances: Optional[List[str]] = None, meta: Optional[Dict] = None)

Represents a discrete subset of utterances in the dataset, connected by a reply-to chain.

Parameters
  • owner – The Corpus that this Conversation belongs to

  • id – The unique ID of this Conversation

  • utterances – A list of the IDs of the Utterances in this Conversation

  • meta – Table of initial values for conversation-level metadata

Variables
  • id – the ID of the Conversation

  • meta – A dictionary-like view object providing read-write access to conversation-level metadata.

add_meta(key: str, value) → None

Adds a key-value pair to the metadata of the corpus object :param key: name of metadata attribute :param value: value of metadata attribute :return: None

add_vector(vector_name: str)

Logs in the Corpus component object’s internal vectors list that the component object has a vector row associated with it in the vector matrix named vector_name. Transformers that add vectors to the Corpus should use this to update the relevant component objects during the transform() step. :param vector_name: name of vector matrix :return: None

check_integrity(verbose: bool = True) → bool

Check the integrity of this Conversation; i.e. do the constituent utterances form a complete reply-to chain?

Parameters

verbose – whether to print errors indicating the problems with the Conversation

Returns

True if the conversation structure is complete else False

delete_vector(vector_name: str)

Delete a vector associated with this Corpus component object. :param vector_name: :return: None

get_chronological_speaker_list(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>)

Get the speakers in the conversation sorted in chronological order (speakers may appear more than once)

Parameters

selector – (lambda) function for which speakers should be included; all speakers are included by default

Returns

list of speakers for each chronological utterance

get_chronological_utterance_list(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>)

Get the utterances in the conversation sorted in increasing order of timestamp

Parameters

selector – function for which utterances should be included; all utterances are included by default

Returns

list of utterances, sorted by timestamp

get_longest_paths() → List[List[convokit.model.utterance.Utterance]]

Finds the Utterances form the longest path (i.e. root to leaf) in the Conversation tree. If there are multiple paths with tied lengths, returns all of them as a list of lists. If only one such path exists, a list containing a single list of Utterances is returned.

Returns

a list of lists of Utterances

get_root_to_leaf_paths() → List[List[convokit.model.utterance.Utterance]]

Get the paths (stored as a list of lists of utterances) from the root to each of the leaves in the conversational tree

Returns

List of lists of Utterances

get_speaker(speaker_id: str) → convokit.model.speaker.Speaker

Looks up the Speaker with the given name. Raises a KeyError if no speaker with that name exists.

Returns

the Speaker with the given speaker_id

get_speaker_ids() → List[str]

Produces a list of ids of all speakers in the Conversation, which can be used in calls to get_speaker() to retrieve specific speakers. Provides no ordering guarantees for the list.

Returns

a list of speaker ids

get_speakers_dataframe(selector: Optional[Callable[[convokit.model.speaker.Speaker], bool]] = <function Conversation.<lambda>>, exclude_meta: bool = False)

Get a DataFrame of the Speakers that have participated in the Conversation with fields and metadata attributes, with an optional selector that filters Speakers that should be included. Edits to the DataFrame do not change the corpus in any way.

param exclude_meta

whether to exclude metadata

param selector

selector: a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return

a pandas DataFrame

get_subtree(root_utt_id)

Get the utterance node of the specified input id

Parameters

root_utt_id – id of the root node that the subtree starts from

Returns

UtteranceNode object

get_utterance(ut_id: str) → convokit.model.utterance.Utterance

Looks up the Utterance associated with the given ID. Raises a KeyError if no utterance by that ID exists.

Returns

the Utterance with the given ID

get_utterance_ids() → List[str]

Produces a list of the unique IDs of all utterances in the Conversation, which can be used in calls to get_utterance() to retrieve specific utterances. Provides no ordering guarantees for the list.

Returns

a list of IDs of Utterances in the Conversation

get_utterances_dataframe(selector=<function Conversation.<lambda>>, exclude_meta: bool = False)

Get a DataFrame of the Utterances in the COnversation with fields and metadata attributes. Set an optional selector that filters Utterances that should be included. Edits to the DataFrame do not change the corpus in any way.

Parameters
  • exclude_meta – whether to exclude metadata

  • selector – a (lambda) function that takes a Utterance and returns True or False (i.e. include / exclude). By default, the selector includes all Utterances in the Conversation.

Returns

a pandas DataFrame

get_vector(vector_name: str, as_dataframe: bool = False, columns: Optional[List[str]] = None)

Get the vector stored as vector_name for this object. :param vector_name: name of vector :param as_dataframe: whether to return the vector as a dataframe (True) or in its raw array form (False). False

by default.

Parameters

columns – optional list of named columns of the vector to include. All columns returned otherwise. This parameter is only used if as_dataframe is set to True

Returns

a numpy / scipy array

iter_speakers(selector: Callable[[convokit.model.speaker.Speaker], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.speaker.Speaker, None, None]

Get Speakers that have participated in the Conversation, with an optional selector that filters for Speakers that should be included.

param selector

a (lambda) function that takes a Speaker and returns True or False (i.e. include / exclude). By default, the selector includes all Speakers in the Conversation.

return

a generator of Speakers

iter_utterances(selector: Callable[[convokit.model.utterance.Utterance], bool] = <function Conversation.<lambda>>) → Generator[convokit.model.utterance.Utterance, None, None]

Get utterances in the Corpus, with an optional selector that filters for Utterances that should be included.

Parameters

selector

a (lambda) function that takes an Utterance and returns True or False (i.e. include / exclude).

By default, the selector includes all Utterances in the Conversation.

return

a generator of Utterances

print_conversation_stats()

Helper function for printing the number of Utterances and Spekaers in the Conversation.

Returns

None (prints output)

print_conversation_structure(utt_info_func: Callable[[convokit.model.utterance.Utterance], str] = <function Conversation.<lambda>>, limit: int = None) → None

Prints an indented representation of utterances in the Conversation with conversation reply-to structure determining the indented level. The details of each utterance to be printed can be configured.

If limit is set to a value other than None, this will annotate utterances with an ‘order’ metadata indicating their temporal order in the conversation, where the first utterance in the conversation is annotated with 1.

Parameters
  • utt_info_func – callable function taking an utterance as input and returning a string of the desired utterance information. By default, this is a lambda function returning the utterance’s speaker’s id

  • limit – maximum number of utterances to print out. if k, this includes the first k utterances.

Returns

None. Prints to stdout.

retrieve_meta(key: str)

Retrieves a value stored under the key of the metadata of corpus object :param key: name of metadata attribute :return: value

traverse(traversal_type: str, as_utterance: bool = True)

Traverse through the Conversation tree structure in a breadth-first search (‘bfs’), depth-first search (dfs), pre-order (‘preorder’), or post-order (‘postorder’) way.

Parameters
  • traversal_type – dfs, bfs, preorder, or postorder

  • as_utterance – whether the iterator should yield the utterance (True) or the utterance node (False)

Returns

an iterator of the utterances or utterance nodes