簡體   English   中英

在 python、Nlp 問題中創建(引理,NER 類型)的元組

[英]Create tuples of (lemma, NER type) in python , Nlp problem

我寫了下面的代碼,並為它制作了一個字典,但我想要創建(引理,NER類型)的元組並收集我不知道該怎么做的元組的計數? 你能幫幫我嗎? NER類型表示名稱實體識別

text = """
Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.
"""

doc = nlp(text).ents
en = [(entity.text, entity.label_) for entity in doc]
en
#entities
#The list stored in variable entities is has type list[list[tuple[str, str]]],
#from pprint import pprint
pprint(en)

sum(filter(None, entities), [])

from collections import defaultdict
type2entities = defaultdict(list)
for entity, entity_type in sum(filter(None, entities), []):
  type2entities[entity_type].append(entity)

from pprint import pprint
pprint(type2entities)


我希望以下代碼片段可以解決您的問題。

import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

text = ("Seville.
Summers in the flamboyant Andalucían capital often nudge 40C, but spring is a delight, with the parks in bloom and the scent of orange blossom and jasmine in the air. And in Semana Santa (Holy Week, 14-20 April) the streets come alive with floats and processions. There is also the raucous annual Feria de Abril – a week-long fiesta of parades, flamenco and partying long into the night (4-11 May; expect higher hotel prices if you visit then).
Seville is a romantic and energetic place, with sights aplenty, from the Unesco-listed cathedral – the largest Gothic cathedral in the world – to the beautiful Alcázar royal palace. But days here are best spent simply wandering the medieval streets of Santa Cruz and along the river to La Real Maestranza, Spain’s most spectacular bullring.
Seville is the birthplace of tapas and perfect for a foodie break – join a tapas tour (try devoursevillefoodtours.com), or stop at the countless bars for a glass of sherry with local jamón ibérico (check out Bar Las Teresas in Santa Cruz or historic Casa Morales in Constitución). Great food markets include the Feria, the oldest, and the wooden, futuristic-looking Metropol Parasol.
Nightlife is, unsurprisingly, late and lively. For flamenco, try one of the peñas, or flamenco social clubs – Torres Macarena on C/Torrijano, perhaps – with bars open across town until the early hours.
Book it: In an atmospheric 18th-century house, the Hospes Casa del Rey de Baeza is a lovely place to stay in lively Santa Cruz. Doubles from £133 room only, hospes.com
Trieste.")

doc = nlp(text)

lemma_ner_list = [] 

for entity in doc.ents: 
   lemma_ner_list.append((entity.lemma_, entity.label_)) 

# print list of lemma ner tuples

print(lemma_ner_list)

# print count of tuples

print(len(lemma_ner_list))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM