简体   繁体   中英

POS tagging in nltk

Hi is there an efficient way for tagging parts of speech in very large files?

 import pandas as pd
 import collections 
 import nltk 

 tokens=nltk.word_tokenize(pandas_dataframe)
 tag1=nltk.pos_tag(tokens)
 counts=collections.counter([y for x,y  in tag1])

I am trying to find the most common parts of speech in a file and don't know of a better way of doing this

Typically you need to get around the for loop, possible high memory load and possible high CPU load.

Here's an example of distributed part of speech tagging using python and execnet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM