简体   繁体   English

python - 如何在python中使用带有数据框的节库进行词形还原?

[英]How to use lemmatization with the stanza library with dataframe in python?

My current database is:我当前的数据库是:

# bibliotecas necessárias
import pandas as pd

dict_noticia = {'nome_adm': ['CC Brasil', 
                           'ABC Futuro Esporte',
                           'Tabuao'], 
              
              'noticia': ["['folha', 'paulo', 'https', 'east', 'amazonaws', 'multclipp', 'arquivos', 'noticias', 'pdf', 'jpg', 'mônica', 'bergamo', 'longo', 'tempo']", 
                   "['coluna', 'estadão']",
                   "['flamengo', 'futebol','melhor','campeao','é']"]
                   }
                   
df = pd.DataFrame(dict_noticia)
df

I need a new column with the lemmas of the "news" column.我需要一个带有“新闻”列的引理的新列。 The script below gives error:下面的脚本给出错误:

import stanza
nlp_stanza = stanza.Pipeline(lang='pt', processors='tokenize,mwt,pos,lemma')

def f_lematizacao_stanza(df,column_name,new_column_name):
    df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(row)]))
    return df

f_lematizacao_stanza(data,'noticia','noticia_lema')

NameError: name 'row' is not defined NameError:名称“行”未定义

How to solve怎么解决

Thank you in advance.先感谢您。

You have not defined the variable row .您尚未定义变量row You need to use x :您需要使用x

def f_lematizacao_stanza(df,column_name,new_column_name):
    df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(x)]))
    return df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM