[英]Get Significant Text aggregation on text field with stop words filtering
我正在嘗試在索引的文本字段(稱為“文本”)中搜索最常用的單詞。 我已經設法使用“重要文本”聚合來執行此操作,但是返回的一些存儲桶包含“the”、“a”、“它們”等詞。我該如何過濾掉它們? 我嘗試使用停用詞分析器,但它仍然沒有幫助。 我也嘗試使用“gnd”,據說這有助於解決這個問題,但我仍然得到了大致相同的結果。
我的查詢:
GET feed/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"by_sentiment": {
"terms": {
"field": "sentiment.Sentiment.keyword",
"size": 50
},
"aggs": {
"trending_topics": {
"significant_text": {
"field": "text",
"filter_duplicate_text": true,
}
}
}
},
"by_level": {
"terms": {
"field": "level",
"size": 50
},
"aggs": {
"trending_topics": {
"significant_text": {
"field": "text",
"filter_duplicate_text": true,
}
}
}
},
"by_asset": {
"terms": {
"field": "asset_id",
"size": 50
},
"aggs": {
"trending_topics": {
"significant_text": {
"field": "text",
"filter_duplicate_text": true,
}
}
}
}
}
}
我設法通過添加一個
"exclude": ["list","of","stop","words"]
到每個“significant_text”聚合。 對於任何感興趣的人,這是我使用的確切列表:
"exclude": ["t.co", "https", "rt", "l", "they", "i", "I", "you", "this", "that", "but", "its", "s", "for", "there", "going", "try", "into", "me", "don’t", "every", "because", "got", "thank", "thanks", "looks", "cha", "been", "would", "my", "from", "now", "and", "im", "mine", "u", "the", "to", "can't", "than", "cant", "in", "self", "of", "with", "your", "is", "do", "not", "ii", "despite", "however", "there's", "isn't", "seems", "though", "a", "via", "will", "also", "that's", "even", "we", "anymore", "anyone", "all", "have", "on", "if", "sure", "as", "at", "are", "it", "so", "be", "are", "everyone", "just", "can", "by", "what", "does", "please", "an", "these", "de", "how", "he", "haha", "were", "us", "should", "when", "or", "o", "another", "those", "am", "yourselves", "don't", "without", "then", "gotta", "myself", "we'll", "our", "we've", "www.reddit.com", "know", "number", "which", "while", "name", "comments", "up", "you're", "seem", "isn't", "being", "them", "ha", "perhaps", "about", "has", "each", "something", "haven't", "their", "t.me", "r", "est", "la", "le", "vous", "et", "à", "les", "pour", "avec", "el", "en", "que", "para", "no"]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.