簡體   English   中英

無法使用 kb.get_candidates 檢索任何候選人

[英]unable to retrieve any candidates with kb.get_candidates

我創建了一個像這樣的 csv 文件:

"CAMERA", "Camera", "kamera", "cam", "Kamera"
"PICTURE", "Picture", "bild", "photograph"

並使用它有點像這樣:

nlp = de_core_news_sm.load()
text = "Cam is not good"
doc = nlp(text)

name_dict, desc_dict = load_entities()

kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=96)

for qid, desc in desc_dict.items():
    desc_doc = nlp(desc)
    desc_enc = desc_doc.vector
    kb.add_entity(entity=qid, entity_vector=desc_enc, freq=342)  # 342 is an arbitrary value here

for qid, name in name_dict.items():
        kb.add_alias(alias=name, entities=[qid], probabilities=[1])  # 100% prior probability P(entity|alias)

打印值如下:

print(f"Entities in the KB: {kb.get_entity_strings()}")
print(f"Aliases in the KB: {kb.get_alias_strings()}")

給我:

Entities in the KB: ['PICTURE', 'CAMERA']
Aliases in the KB: [' "Camera"', ' "Picture"']

但是,如果我嘗試檢查候選人,我只會得到一個空列表:

candidates = kb.get_candidates("Camera")
print(candidates)
for c in candidates:
    print(" ", c.entity_, c.prior_prob, c.entity_vector)

Aliases in the KB: [' "Camera"', ' "Picture"']

在我看來,好像您的解析腳本向 KB 添加了文字字符串"Camera" ,其中包含空格和引號以及所有內容,而不僅僅是原始字符串Camera

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM