[英]How to use lemmatization with the stanza library with dataframe in python?
My current database is:我当前的数据库是:
# bibliotecas necessárias
import pandas as pd
dict_noticia = {'nome_adm': ['CC Brasil',
'ABC Futuro Esporte',
'Tabuao'],
'noticia': ["['folha', 'paulo', 'https', 'east', 'amazonaws', 'multclipp', 'arquivos', 'noticias', 'pdf', 'jpg', 'mônica', 'bergamo', 'longo', 'tempo']",
"['coluna', 'estadão']",
"['flamengo', 'futebol','melhor','campeao','é']"]
}
df = pd.DataFrame(dict_noticia)
df
I need a new column with the lemmas of the "news" column.我需要一个带有“新闻”列的引理的新列。 The script below gives error:下面的脚本给出错误:
import stanza
nlp_stanza = stanza.Pipeline(lang='pt', processors='tokenize,mwt,pos,lemma')
def f_lematizacao_stanza(df,column_name,new_column_name):
df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(row)]))
return df
f_lematizacao_stanza(data,'noticia','noticia_lema')
NameError: name 'row' is not defined NameError:名称“行”未定义
How to solve怎么解决
Thank you in advance.先感谢您。
You have not defined the variable row
.您尚未定义变量row
。 You need to use x
:您需要使用x
:
def f_lematizacao_stanza(df,column_name,new_column_name):
df[new_column_name] = df[column_name].apply(lambda x: ([w.lemma_ for w in nlp_stanza(x)]))
return df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.