NLP google cloud 20字以內

Question

根據此文檔：classifyText 方法至少需要 20 個單詞。

https://cloud.google.com/natural-language/docs/classifying-text#language-classify-content-nodejs

如果我發送的內容少於 20 個字，無論內容多么清晰，我都會收到：

Invalid text content: too few tokens (words) to process.

尋找一種在不過多破壞 NLP 的情況下強制執行此操作的方法。 是否有中性向量詞可以附加到允許分類文本處理的短語中？

前任。

async function quickstart() {
    const language = require('@google-cloud/language');


    const client = new language.LanguageServiceClient();

  //less than 20 words. What if I append some other neutral words? 
//.. a, of , it, to or would it be better to repeat the phrase?


    const text = 'The Atlanta Braves is the best team.';


    const document = {
        content: text,
        type: 'PLAIN_TEXT',
    };


    const [classification] = await client.classifyText({document});
    console.log('Categories:');
    classification.categories.forEach(category => {
        console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
    });

}

quickstart();

Answer 1

這樣做的問題是，無論您發送什么樣的文本，都會增加偏見。

你唯一的機會是用空詞填充你的字符串到最小字數限制，這些空詞將在它們 go 到神經網絡之前被預處理器和標記器過濾掉。

我會嘗試在句子末尾添加一個字符串后綴，只使用來自NLTK的停用詞，如下所示：

document.content += ". and ourselves as herserf for each all above into through nor me and then by doing"

為什么要結束？ 因為通常文本在開頭有更多信息。

如果谷歌沒有在幕后過濾停用詞（我對此表示懷疑），這只會在網絡沒有焦點或注意力的地方添加白噪聲。

請記住：當您有足夠的單詞時不要添加此字符串，因為在過濾之前您需要為 1K 字符塊付費。

我還會將該字符串后綴添加到您的訓練/測試/驗證集中少於 20 個單詞的句子中，看看它是如何工作的。 網絡應該學會忽略整個句子。

NLP google cloud 20字以內

問題描述

1 個解決方案

解決方案1
1 已采納 2021-05-26 09:59:19

NLP google cloud 20字以內

問題描述

1 個解決方案

解決方案1 1 已采納 2021-05-26 09:59:19

解決方案1
1 已采納 2021-05-26 09:59:19