[英]unable to retrieve any candidates with kb.get_candidates
我創建了一個像這樣的 csv 文件:
"CAMERA", "Camera", "kamera", "cam", "Kamera"
"PICTURE", "Picture", "bild", "photograph"
並使用它有點像這樣:
nlp = de_core_news_sm.load()
text = "Cam is not good"
doc = nlp(text)
name_dict, desc_dict = load_entities()
kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=96)
for qid, desc in desc_dict.items():
desc_doc = nlp(desc)
desc_enc = desc_doc.vector
kb.add_entity(entity=qid, entity_vector=desc_enc, freq=342) # 342 is an arbitrary value here
for qid, name in name_dict.items():
kb.add_alias(alias=name, entities=[qid], probabilities=[1]) # 100% prior probability P(entity|alias)
打印值如下:
print(f"Entities in the KB: {kb.get_entity_strings()}")
print(f"Aliases in the KB: {kb.get_alias_strings()}")
給我:
Entities in the KB: ['PICTURE', 'CAMERA']
Aliases in the KB: [' "Camera"', ' "Picture"']
但是,如果我嘗試檢查候選人,我只會得到一個空列表:
candidates = kb.get_candidates("Camera")
print(candidates)
for c in candidates:
print(" ", c.entity_, c.prior_prob, c.entity_vector)
Aliases in the KB: [' "Camera"', ' "Picture"']
在我看來,好像您的解析腳本向 KB 添加了文字字符串"Camera"
,其中包含空格和引號以及所有內容,而不僅僅是原始字符串Camera
?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.