Hi is there an efficient way for tagging parts of speech in very large files?
import pandas as pd
import collections
import nltk
tokens=nltk.word_tokenize(pandas_dataframe)
tag1=nltk.pos_tag(tokens)
counts=collections.counter([y for x,y in tag1])
I am trying to find the most common parts of speech in a file and don't know of a better way of doing this
Typically you need to get around the for loop, possible high memory load and possible high CPU load.
Here's an example of distributed part of speech tagging using python and execnet.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.