简体   繁体   English

如何在每个句子中查找和匹配列表的每个元素?

[英]How to find and match each elements of a list on each sentences?

I have a file including some sentences. 我有一个包含一些句子的文件。 I used polyglot for Named Entity Recognition and stored all detected entities in a list. 我将多语言用于命名实体识别,并将所有检测到的实体存储在列表中。 Now I want to check if in each sentence any or pair of entities exist, show that for me. 现在,我想检查每个句子中是否存在任何实体或成对实体,请为我显示。

Here what I did: 这是我做的:

from polyglot.text import Text

file = open('input_raw.txt', 'r')
input_file = file.read()
test = Text(input_file, hint_language_code='fa')

list_entity = []
for sent in test.sentences:
    #print(sent[:10], "\n")
    for entity in test.entities:
       list_entity.append(entity)

for i in range(len(test)):
    m = test.entities[i]
    n = test.words[m.start: m.end] # it shows only word not tag
    if str(n).split('.')[-1] in test: # if each entities exist in each sentence
         print(n)

It gives me an empty list. 它给了我一个空的清单。

Input: 输入:

 sentence1: Bill Gate is the founder of Microsoft.
 sentence2: Trump is the president of USA.

Expected output: 预期产量:

Bill Gate, Microsoft
Trump, USA

Output of list_entity: list_entity的输出:

I-PER(['Trump']), I-LOC(['USA'])

How to check if I-PER(['Trump']) , I-LOC(['USA']) is in first sentence? 如何检查I-PER(['Trump'])I-LOC(['USA'])是否在第一句中?

For starters you were adding the whole text file input to the entities list. 首先,您要将整个文本文件输入添加到实体列表中。 entities can only be called by each sentence in the polyglot object. entities只能由多语对象中的每个句子调用。

from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')

list_entity = []
for sentence in file.sentences:
    for entity in sentence.entities:
        #print(entity)
        list_entity.append(entity)

print(list_entity)

Now you don't have an empty list. 现在您没有空列表。


As for your problem with identifying the identity terms, 至于您在识别身份方面的问题,

I have not found a way to generate an entity by hand, so the following simply checks if there are entities with the same term. 我还没有找到手动生成实体的方法,因此以下内容仅检查是否有相同术语的实体。 A Chunk can have multiple strings inside, so we can go through them iteratively. 块内部可以有多个字符串,因此我们可以迭代地遍历它们。

from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='ar')

def check_sentence(entities_list, sentence): ## Check if string terms 
    for term in entities_list:               ## are in any of the entities
        ## Compare each Chunk in the list to each Chunk 
        ## object  in the sentence and see if there's any matches.
        if any(any(entityTerm == term for entityTerm in entityObject) 
               for entityObject in sentence.entities):
            pass
        else:
            return False
    return True

sentence_number = 1 # Which sentence to check
sentence = file.sentences[sentence_number]
entity_terms = ["Bill", 
                "Gates"]

if check_sentence(entity_terms, sentence):
    print("Entity Terms " + str(entity_terms) +  
          " are in the sentence. '" + str(sentence)+ "'")
else:
    print("Sentence '" + str(sentence) + 
          "' doesn't contain terms" + str(entity_terms ))

Once you find a way to generate arbitrary entities all you'll have to do is stop popping the term from the sentence checker so you can do type comparison as well. 一旦找到一种生成任意实体的方法,您要做的就是停止从句子检查器中弹出该术语,以便您也可以进行类型比较。


If you just want to match the list of entities in the file against a specific sentence, then this should do the trick: 如果您只想将文件中的实体列表与特定句子进行匹配,则可以做到这一点:

from polyglot.text import Text
file = open('input_raw.txt', 'r')
input_file = file.read()
file = Text(input_file, hint_language_code='fa')

def return_match(entities_list, sentence): ## Check if and which chunks
    matches = []                           ## are in the sentence
    for term in entities_list:                  
        ## Check each list in each Chunk object 
        ## and see if there's any matches.
        for entity in sentence.entities:
            if entity == term:
                for word in entity:
                    matches.append(word)
    return matches

def return_list_of_entities(file):
    list_entity = []
    for sentence in file.sentences:
        for entity in sentence.entities:
            list_entity.append(entity)
    return list_entity

list_entity = return_list_of_entities(file)
sentence_number = 1 # Which sentence to check
sentence = file.sentences[sentence_number]
match = return_match(list_entity, sentence)

if match:
    print("Entity Term " + str(match) +  
          " is in the sentence. '" + str(sentence)+ "'")
else:
    print("Sentence '" + str(sentence) + 
          "' doesn't contain any of the terms" + str(list_entity))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM