Spacy Matcher: TypeError: an integer is required

Question

I'm trying to create a phrase matcher with spacy but I keep getting an error that says: TypeError: an integer is required. My 'classes' are the rows from a column I selected from my database. I don't understand why an integer is required since looking at the documentation they seem to be doing the same thing I am doing but whenever I try to run the code I get an error in my matcher.add. Any idea would be appreciated. This is my code:

import pandas as pd
import spacy
from spacy.matcher import PhraseMatcher
from nltk.tokenize import word_tokenize, sent_tokenize
import nltk

data = pd.read_csv('C:/woclorev.csv')

class_name = data['Class Name'].drop_duplicates()
class_name_str = class_name.tolist()

reviews = data['Reviewtext'].astype(str)
token_rev = reviews.apply(word_tokenize)

#PhraseMatcher object
matcher = PhraseMatcher(nlp.vocab, attr='LOWER')
matcher.add('Classes', None, *class_name_str)
matches = matcher(token_rev)

This is the full error message: File "", line 1, in File "phrasematcher.pyx", line 209, in spacy.matcher.phrasematcher.PhraseMatcher.add TypeError: an integer is required

Answer 1

From the docs

The PhraseMatcher lets you efficiently match large terminology lists. While the Matcher lets you match sequences based on lists of token descriptions, the PhraseMatcher accepts match patterns in the form of Doc objects.

Without being able to see what word_tokenize is (I'm assuming nltk implementation?) it's hard to say for sure, but if that function is not returning spaCy Doc objects, the matcher will likely raise an exception

Answer 2

The problem is that your matcher is not receiving a list of doc objects, but a list of strings.

The inputs for the function matcher.add() are:

A custom ID for your matcher;
Optional parameter for callable function;
A pattern list.

You can convert the list of phrases into a doc object through make_doc() method. It is faster and saves time.

Terms to match

terms_list = ['Bruce Wayne', 'Tony Stark', 'Batman', 'Harry Potter', 'Severus Snape']

Make a list of docs

patterns = [nlp.make_doc(text) for text in terms_list]

Add to the macher

matcher.add("phrase_matcher", None, *patterns)

Reference: https://www.machinelearningplus.com/spacy-tutorial-nlp/

Spacy Matcher: TypeError: an integer is required

Question

2 answers

solution1
1 2020-05-22 01:56:11

solution2
0 2020-09-01 19:23:47

Terms to match

Make a list of docs

Add to the macher

Spacy Matcher: TypeError: an integer is required

Question

2 answers

solution1 1 2020-05-22 01:56:11

solution2 0 2020-09-01 19:23:47

Terms to match

Make a list of docs

Add to the macher

solution1
1 2020-05-22 01:56:11

solution2
0 2020-09-01 19:23:47