Troubleshooting FAQ¶
General checks¶
Check that you are using the latest version of ConvoKit
Verify that your installed package dependencies for ConvoKit satisfy ConvoKit’s versioning requirements
If possible, use a Unix system, i.e. Mac OS or the Linux distros. We advise against using Windows, but Windows speakers may consider using the Windows subsystem for Linux (WSL) instead.
Issues¶
Error Associated with Numpy 2.0.0
The release of numpy 2.0.0 is exciting, yet it breaks backward compatibility for packages that are built with numpy 1.x. While our new release (ConvoKit 3.0.1) addresses the problem and is built against the new numpy, there are ConvoKit dependency packages that still experience issues with numpy 2.0.0+. We have fixed known errors from testing, but the tests are far from exhaustive. Therefore, if you face an error likely triggered by adapting to numpy 2.0.0+ versions, we encourage you to please submit an issue on our GitHub, so we can address the problem as soon as possible. Thank you!
For explanations of what errors numpy 2.0 could cause on packages that are not built against it, check the numpy 2.0 migration guide.
An example of an issue that we fixed is demonstrated below:
Pre-Spacy 3.8.2 is not compatible with numpy 2.0.0+ due to compatibility issues with thinc. Spacy 3.8.2 is compatible with numpy 2.0.0+ but currently requires thinc to be >=8.3.0, <8.4.0. As a temporary solution, ConvoKit now enforces spacy>=3.8.2, thinc >=8.3.0, <8.4.0. We will continue to monitor spacy releases and update the requirements if there are new releases targeting this issue.
For additional information about the issue, see: spaCy issue, thinc issue.
The issues are more likely to appear when you install ConvoKit in an existing environment where other packages have already been pre-installed. Installing ConvoKit in a new environment following our installation guide should result in no errors.
OSError: [E050] Can’t find model ‘en’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a directory.
As mentioned in the installation instructions, one needs to run “python -m spacy download en” so that a model ‘en’ exists.
However, there is a secondary issue specific to Windows machines:
python -m spacy download en appears successful but actually fails to link the downloaded model to spaCy [Windows]
The output from the command suggests that linking is successful and that spacy.load(‘en_core_web_sm’) should succeed. However, a closer inspection of the first set of outputs reveals an error message: “You do not have sufficient privilege to perform this operation.”
The operation referred to is the linking of ‘en’. This issue has been raised here and has been acknowledged as a bug.
The solution is to either do the installation in a virtualenv (where the privileges required for linking is lower) or run powershell as administrator.
error: Microsoft Visual C++ 14.0 is required. when installing SpaCy [Windows]
SpaCy, one of ConvoKit’s dependencies, itself has an assortment of dependencies (e.g. murmurhash, cytoolz.) On Windows, these must be built using Microsoft Visual C++ build tools to be built properly.
The build tools can be downloaded from Microsoft here. Take care to not confuse these tools with other Microsoft distributables.
error: command ‘gcc’ failed with exit status 1 [Mac OS]
This is an error encountered when installing the SpaCy dependency for ConvoKit on MacOS. The solution is to link the required C++ standard library explicitly, like so:
>>> CFLAGS=-stdlib=libc++ python3 -m pip install convokit
urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed [Mac OS]
This is an error encountered when using the convokit.download()
function without having SSL certificates properly set up.
An explanation for this error is detailed in this site.
The two recommended fixes are to run:
>>> pip install --upgrade certifi
and if that doesn’t fix the issue, then run:
>>> open /Applications/Python\ 3.10/Install\ Certificates.command
(Substitute 3.10 in the above command with your current Python version (e.g. 3.11 or 3.12 or 3.13) if necessary.)
Immutability of Metadata Fields¶
Starting with 3.0, ConvoKit disallows mutation on Metadata fields to prevent unintended data loss and ensure the integrity of the corpus metadata backend storage. When accessing a Metadata field, a deep copy of the field is returned to prevent mutation changes to the copy from affecting the backend storage. This behavior is intended to ensure consistency between DB and MEM modes, since permitting mutations to mutable metadata fields in DB mode would solely modify the in-memory data without updating the database, thereby risking potential data loss.
Therefore, all metadata values must be treated as immutable. This does not really make a difference for primitive values like ints and strings,
since those are immutable in Python to begin with. However, code that relies on mutating a more complex type like lists or dictionaries may not work as expected.
For example, suppose the metadata entry "foo"
is a list type, and you access it by saving it to a Python variable as follows:
>>> saved_foo = my_utt.meta["foo"]
Because lists are considered mutable in Python, you might expect the following code to successfully add a new item in the foo
metadata of my_utt
:
>>> saved_foo.append("new value")
However, it will not work starting with the 3.0 version of ConvoKit; the code will run, but only the variable saved_foo
will be affected, not the actual metadata storage of my_utt
.
This is because saved_foo
only contains a copy of the data from the backend storage.
Thus, any operations that are done directly on saved_foo
are done only to the copy, and do not involve any backend storage writes.
It is therefore necessary to treat all metadata objects, regardless of type, as immutable.
Thus, the way to change metadata is the same way you would change an int or string type metadata entry: that is, by completely overwriting it.
For example, to achieve the desired effect with the "foo"
metadata entry from above, you should do the following:
>>> temp_foo = my_utt.meta["foo"]
>>> temp_foo.append("new value")
>>> my_utt.meta["foo"] = temp_foo
By adding the additional line of code that overwrites the "foo"
metadata entry, you are telling ConvoKit that you want to update the value of "foo"
in the storage’s metadata table with a new value, represented by temp_foo
which contains the new additional item.
Thus the contents of temp_foo
will get written to the backend storage as the new value of my_utt.meta["foo"]
, hence updating the metadata as desired.