简体   繁体   中英

Speeding up Python Lemmatizer with 'for' Loop and 'if' Statements

I use this code to apply a Lemmatizer depending to the postage of a word.

def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    lem = []
    for word, tag in pos_tag(word_tokenize(sentence)):
        if tag.startswith("NN"):
            lem.append(wnl.lemmatize(word, pos='n'))
        elif tag.startswith('VB'):
            lem.append(wnl.lemmatize(word, pos='v'))
        elif tag.startswith('JJ'):
            lem.append(wnl.lemmatize(word, pos='a'))
        else:
            lem.append(word)
    return lem

The problem is that the more data I have, the longer it takes. Could you help me to accelerate the code please.

I'm not sure if this suits you but it sure can replicate the behavior of your code and could be expanded easily.

def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    lem = []

    tags = {
        'NN': 'n',
        'VB': 'v',
        'JJ': 'a',
    }

    for word, tag in pos_tag(word_tokenize(sentence)):
        tag_start = tag[:2]
        if tag_start in tags:
            lem.append(wnl.lemmatize(word, pos=tags[tag_start]))
        else:
            lem.append(word)
    return lem

This way you can create a dictionary that translates tags into pos. Or if there's more tags than poses, maybe this will come in handy:

def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    lem = []

    tags = {
        'n': ['NN','NA'],
        'v': ['VB','VA'],
        'a': ['JJ','JA'],
    }

    for word, tag in pos_tag(word_tokenize(sentence)):
        tag_start = tag[:2]
        if tag_start in  tags['n']:
            lem.append(wnl.lemmatize(word, pos='n'))
        elif tag_start in  tags['v']:
            lem.append(wnl.lemmatize(word, pos='v'))
        elif tag_start in  tags['a']:
            lem.append(wnl.lemmatize(word, pos='a'))
        else:
            lem.append(word)
    return lem

I added tags starting with NA, VA and JA to illustrate how to expand the code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM