简体   繁体   English

如何迭代具有空间的列以获取命名实体?

[英]How can I iterate on a column with spacy to get named entities?

I got a dataframe with a column named "categories".我得到了一个 dataframe ,其中有一列名为“类别”。 Some data of this column looks like this {[], [], [amazon], [clothes], [telecommunication],[],...} .该列的一些数据看起来像这样{[], [], [amazon], [clothes], [telecommunication],[],...} Every row has only one of this values.每行只有一个值。 My task is now to give this values their entities.我现在的任务是为这些值赋予它们的实体。 I tried a lot but it didn't go well.我尝试了很多,但它没有 go 很好。 This was my first attempt这是我的第一次尝试

import spacy
nlp = spacy.load("de_core_news_sm")
doc=list(nlp.pipe(df.categories))
print([(X.text, X.label_) for X in doc.ents])
AttributeError 'list' object has no attribute 'ents'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ----> 1 print([(X.text, X.label_) for X in doc.ents])
AttributeError: 'list' object has no attribute 'ents'

My second attempt:我的第二次尝试:

for token in doc:
print(token.doc, token.pos_, token.dep_)
AttributeError 'spacy.tokens.doc.Doc' object has no attribute 'pos_'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in 1 for token in doc: ----> 2 print(token.doc, token.pos_, token.dep_)
AttributeError 'spacy.tokens.doc.Doc' object has no attribute 'pos_'

Third attempt:第三次尝试:

docs = df["categories"].apply(nlp)
for token in docs:
    print(token.text, token.pos_, token.dep_)
AttributeError 'spacy.tokens.doc.Doc' object has no attribute 'docs'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in 1 docs = df["categories"].apply(nlp) 2 for token in docs: ----> 3              print(token.docs, token.pos_, token.dep_) 
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'docs'

I just want to iterate spacy on this column to give me for the values an entity.我只想在此列上迭代 spacy,以便为我提供一个实体的值。 For the empty values it should give me no entity.对于空值,它不应该给我任何实体。 The column is a string.该列是一个字符串。 Thanks for help.感谢帮助。

You have list with many doc and you have to use extra for -loop to work with every doc separatelly.您有许多doc的列表,您必须使用额外for -loop 来分别处理每个文档。

docs = list(nlp.pipe(df.categories))   # variable `docs` instead of `doc`

for doc in docs:   
    print([(X.text, X.label_) for X in doc.ents])

and

docs = list(nlp.pipe(df.categories))   # variable `docs` instead of `doc`

for doc in docs:   
    for token in doc:
        print(token.text, token.pos_, token.dep_)

Documentations Language Processing Pipelines shows it like文档语言处理管道显示它像

for doc in nlp.pipe(df.categories):   
    print([(X.text, X.label_) for X in doc.ents])
    for token in doc:
        print(token.text, token.pos_, token.dep_)

And the same problem is with apply(nlp)同样的问题是apply(nlp)

docs = df["categories"].apply(nlp)

for doc in docs:
    for token in doc:
        print(token.text, token.pos_, token.dep_)

Full working example:完整的工作示例:

import spacy
import pandas as pd

df = pd.DataFrame({
    'categories': ['amazon', 'clothes', 'telecommunication']
})

nlp = spacy.load("de_core_news_sm")

print('\n--- version 1 ---\n')

docs = list(nlp.pipe(df.categories))

for doc in docs:
    print([(X.text, X.label_) for X in doc.ents])
    
    for token in doc:
        print(token.text, token.pos_, token.dep_)

print('\n--- version 2 ---\n')

docs = df["categories"].apply(nlp)

for doc in docs:
    for token in doc:
        print(token.text, token.pos_, token.dep_)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 spaCy 可以只链接命名实体吗? - Can spaCy link only named entities? 如何让 SpaCy 识别所有给定的实体 - How can I make SpaCy recognize all my given entities 如何使用 SpaCy 从 Pandas DataFrame 中提取命名实体 - How to extract Named Entities from Pandas DataFrame using SpaCy 如何将空间命名实体链接到嵌套字典中的文本? - How to link the spacy named entities to the the text from a nested dictionary? 从 spacy 对象中删除命名实体 - Remove Named Entities from the spacy object Python-使用spacy标记所有命名实体 - Python - Tag all named entities with spacy 使用 spacy 从文档中删除命名实体 - Removing named entities from a document using spacy 使用 SpaCy 和 python lambda 提取命名实体 - Extract Named Entities using SpaCy and python lambda spaCy:如何为此使用一些已加载的 model 将命名实体写入现有文档 object? - spaCy: How to write named entities to an existing Doc object using some loaded model for this? 如何将带有命名实体的 CoNNL 格式的文本导入 spaCy,使用我的 model 推断实体并将它们写入同一数据集(使用 Python)? - How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM