简体   繁体   English

Google Cloud NLP-未返回任何实体

[英]Google Cloud NLP - No Entities Returned

We are having some issues with the Google NLP service. Google NLP服务存在一些问题。 The service is intermittently refusing to return entities for certain terms. 该服务间歇性地拒绝返回某些条款的实体。 We use the NLP annotate API for free text answers to survey responses. 我们使用NLP注释API来获得免费的文本答案,以用于调查答复。 A recent question was related to an image of a kids TV character in the UK called Zippy. 最近的一个问题与英国一个名为Zippy的儿童电视角色的图像有关。 Some example responses are below. 下面是一些示例响应。 Unfortunately we had thousands of responses like this and none of them detected "zippy" as an entity. 不幸的是,我们有成千上万个这样的响应,但没有一个将“ zippy”检测为一个实体。 Strangely "elmo", "zippie" and others were detected without any issue, only this specific set of chars ("zippy") returned with no entities. 奇怪的是,检测到“ elmo”,“ zippie”和其他字符没有任何问题,仅返回了这组特定的字符(“ zippy”),没有任何实体。 Any ideas why this might be? 任何想法为什么会这样?

{
"sentences": [{
    "text": {
        "content": "zippy",
        "beginOffset": 0
    },
    "sentiment": {
        "magnitude": 0.1,
        "score": 0.1
    }
}],
"tokens": [],
"entities": [],
"documentSentiment": {
    "magnitude": 0.1,
    "score": 0.1
},
"language": "en",
"categories": []
}

"rainbow" detected but not "zippy" 检测到“彩虹”,但未检测到“ zippy”

{
"sentences": [{
    "text": {
        "content": "zippy from rainbow",
        "beginOffset": 0
    },
    "sentiment": {
        "magnitude": 0.1,
        "score": 0.1
    }
}],
"tokens": [],
"entities": [{
    "name": "rainbow",
    "type": "OTHER",
    "metadata": [],
    "salience": 1,
    "mentions": [{
        "text": {
            "content": "rainbow",
            "beginOffset": 11
        },
        "type": "COMMON"
    }]
}],
"documentSentiment": {
    "magnitude": 0.1,
    "score": 0.1
},
"language": "en",
"categories": []
}

"zippie" detected fine “ zippie”检测良好

{
"sentences": [{
    "text": {
        "content": "zippie",
        "beginOffset": 0
    },
    "sentiment": {
        "magnitude": 0,
        "score": 0
    }
}],
"tokens": [],
"entities": [{
    "name": "zippie",
    "type": "OTHER",
    "metadata": [],
    "salience": 1,
    "mentions": [{
        "text": {
            "content": "zippie",
            "beginOffset": 0
        },
        "type": "PROPER"
    }]
}],
"documentSentiment": {
    "magnitude": 0,
    "score": 0
},
"language": "en",
"categories": []
}

"elmo" detected fine “ elmo”检测良好

{
"sentences": [{
    "text": {
        "content": "elmo",
        "beginOffset": 0
    },
    "sentiment": {
        "magnitude": 0.1,
        "score": 0.1
    }
}],
"tokens": [],
"entities": [{
    "name": "elmo",
    "type": "OTHER",
    "metadata": [],
    "salience": 1,
    "mentions": [{
        "text": {
            "content": "elmo",
            "beginOffset": 0
        },
        "type": "COMMON"
    }]
}],
"documentSentiment": {
    "magnitude": 0.1,
    "score": 0.1
},
"language": "en",
"categories": []
}

Services like these are trained on a specific corpus of 'entity' values. 像这样的服务在“实体”价值的特定语料库上接受培训。

The service tokenizes/chunks, then uses part of speech tagging to identify noun phrases and checks against a giant index to see if that noun phrase is an entity. 该服务标记/分块,然后使用语音标记的一部分来识别名词短语,并对照巨型索引检查该名词短语是否为实体。

Zippy must not be in the corpus. Zippy一定不能在语料库中。 Not sure about google NLP, but Watson NLU comes with a GUI product for easily creating your own 'dictionary' of entity noun phrases. 不确定google NLP,但Watson NLU附带了一个GUI产品,可轻松创建您自己的实体名词短语的“字典”。

Also very possible to create your own using NLTK or from scratch in python, but all require the effort of manually curating your own 'dictionary', unless you are able to get your hands on and adapt another. 使用NLTK或在python中从头开始创建自己的字典也是很有可能的,但是所有这些都需要人工编写自己的“字典”,除非您能够动手并改编另一个字典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM