简体   繁体   中英

NLP - Match topic to document

We are processing a large lists of documents (similar to product descriptions) and want to figure out if they refer to a given topic (ie Gambling). Our current approach is to manually define a set of keywords and then use Spacy's Phrase Matcher to find any hits. We use all the pre-trained attributes like lower and lemma. Nevertheless the process is not very efficient. Are there any other libraries available? Or fundamentally different approaches?

As we don't have the data to train the model ourselves we are looking for pre-trained models.

Additional approaches included to use NLTK's stemmers (Lancaster and Snowball).

One further requirement is languages (texts are in English, German, Italian and French).

You might want to consider adding a text categorizer to your spacy pipeline. See details at https://spacy.io/usage/training#textcat

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM