ConvoKitMatrix

class convokit.model.convoKitMatrix.ConvoKitMatrix(name, matrix, ids: Optional[List[str]] = None, columns: Optional[List[str]] = None)

A ConvoKitMatrix stores the vector representations of some set of Corpus components (i.e. Utterances, Conversations, Speakers).

Parameters
  • name – descriptive name for the matrix

  • matrix – numpy or scipy array matrix

  • ids – optional list of Corpus component object ids, where each id corresponds to each row of the matrix

  • columns – optional list of names for the columns of the matrix

Variables
  • name – name of the matrix

  • matrix – the matrix data

  • ids – ids corresponding to rows

  • columns – names corresponding to columns

  • ids_to_idx – a mapping from id to the row index

  • cols_to_idx – a mapping from column name to the column index

dump(dirpath)

Dumps the ConvoKitMatrix as a pickle file.

Parameters

dirpath – directory path to Corpus

Returns

None

static from_dir(dirpath, matrix_name)

Initialize a ConvoKitMatrix of the specified matrix_name from a specified directory dirpath.

Parameters
  • dirpath – path to Corpus directory

  • matrix_name – name of vector matrix

Returns

the initialized ConvoKitMatrix

static from_file(filepath)

Initialize a ConvoKitMatrix from a file of form “vector.[name].p”.

Parameters

filepath

Returns

get_vectors(ids: Optional[List[str]] = None, columns: Optional[List[str]] = None, as_dataframe: bool = False)
Parameters
  • ids – optional list of object ids to get vectors for; all by default

  • columns – optional list of named columns of the vector to include; all by default

  • as_dataframe – whether to return the vector as a dataframe (True) or in its raw array form (False). False by default.

Returns

a vector matrix (either np.ndarray or csr_matrix) or a pandas dataframe

static hstack(name: str, matrices: List[ConvoKitMatrix])

Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same ids).

Parameters
  • name – name of new matrix

  • matrices – constituent ConvoKiMatrices

Returns

a new ConvoKitMatrix

subset(ids: Optional[List[str]] = None, columns: Optional[List[str]] = None)

Get a (subset) copy of the ConvoKitMatrix object according to specified subset of ids and columns :param ids: list of ids to be included in the subset; all by default :param columns: list of columns to be included in the subset; all by default :return: a new ConvoKitMatrix object with the subset of

to_dataframe() → pandas.DataFrame

Converts the matrix of vectors into a pandas DataFrame.

Returns

a pandas DataFrame

static vstack(name: str, matrices: List[ConvoKitMatrix])

Combines multiple ConvoKitMatrices into a single ConvoKitMatrix by stacking them horizontally (i.e. each constituent matrix must have the same columns).

Parameters
  • name – name of new matrix

  • matrices – constituent ConvoKiMatrices

Returns

a new ConvoKitMatrix