任何有效的方法来找到周围的ADJ尊重python中的目标短语？

Question

I am doing sentiment analysis on given documents, my goal is I want to find out the closest or surrounding adjective words respect to target phrase in my sentences. 我正在对给定的文件进行情感分析，我的目标是我想在我的句子中找出与目标短语相关的最接近或周围的形容词。 I do have an idea how to extract surrounding words respect to target phrases, but How do I find out relatively close or closest adjective or NNP or VBN or other POS tag respect to target phrase. 我确实知道如何提取与目标短语相关的周围词，但我如何找到相对接近或最接近的形容词或NNP或VBN或其他POS标签方面的目标短语。

Here is the sketch idea of how I may get surrounding words to respect to my target phrase. 这是关于如何使周围的单词尊重我的目标短语的草图概念。

sentence_List= {"Obviously one of the most important features of any computer is the human interface.", "Good for everyday computing and web browsing.",
"My problem was with DELL Customer Service", "I play a lot of casual games online[comma] and the touchpad is very responsive"}

target_phraseList={"human interface","everyday computing","DELL Customer Service","touchpad"}

Note that my original dataset was given as dataframe where the list of the sentence and respective target phrases were given. 请注意，我的原始数据集是作为数据框给出的，其中给出了句子列表和相应的目标短语。 Here I just simulated data as follows: 这里我只是模拟数据如下：

import pandas as pd
df=pd.Series(sentence_List, target_phraseList)
df=pd.DataFrame(df)

Here I tokenize the sentence as follow: 在这里我将句子标记为如下：

from nltk.tokenize import word_tokenize
tokenized_sents = [word_tokenize(i) for i in sentence_List]
tokenized=[i for i in tokenized_sents]

then I try to find out surrounding words respect to my target phrases by using this loot at here . 然后我试着通过在这里使用这个战利品找出对我的目标短语的周围的话。 However, I want to find out relatively closer or closet adjective , or verbs or VBN respect to my target phrase. 但是，我想找出相对更接近或壁橱的adjective ，或verbs或VBN尊重我的目标短语。 How can I make this happen? 我怎样才能做到这一点？ Any idea to get this done? 有没有想过要做到这一点？ Thanks 谢谢

Answer 1

Would something like the following work for you? 以下是适合您的工作吗？ I recognize there are some tweaks that need to be made to make this fully useful (checking for upper/lower case; it will also return the word ahead in the sentence rather than the one behind if there is a tie) but hopefully it is useful enough to get you started: 我认识到需要进行一些调整以使其完全有用（检查大写/小写;它还将在句子中返回前面的单词而不是后面的单词，如果有一个平局）但希望它是有用的足以让你入门：

import nltk
from nltk.tokenize import MWETokenizer

def smart_tokenizer(sentence, target_phrase):
    """
    Tokenize a sentence using a full target phrase.
    """
    tokenizer = MWETokenizer()
    target_tuple = tuple(target_phrase.split())
    tokenizer.add_mwe(target_tuple)
    token_sentence = nltk.pos_tag(tokenizer.tokenize(sentence.split()))

    # The MWETokenizer puts underscores to replace spaces, for some reason
    # So just identify what the phrase has been converted to
    temp_phrase = target_phrase.replace(' ', '_')
    target_index = [i for i, y in enumerate(token_sentence) if y[0] == temp_phrase]
    if len(target_index) == 0:
        return None, None
    else:
        return token_sentence, target_index[0]


def search(text_tag, tokenized_sentence, target_index):
    """
    Search for a part of speech (POS) nearest a target phrase of interest.
    """
    for i, entry in enumerate(tokenized_sentence):
        # entry[0] is the word; entry[1] is the POS
        ahead = target_index + i
        behind = target_index - i
        try:
            if (tokenized_sentence[ahead][1]) == text_tag:
                return tokenized_sentence[ahead][0]
        except IndexError:
            try:
                if (tokenized_sentence[behind][1]) == text_tag:
                    return tokenized_sentence[behind][0]
            except IndexError:
                continue

x, i = smart_tokenizer(sentence='My problem was with DELL Customer Service',
                       target_phrase='DELL Customer Service')
print(search('NN', x, i))

y, j = smart_tokenizer(sentence="Good for everyday computing and web browsing.",
                       target_phrase="everyday computing")
print(search('NN', y, j))

Edit: I made some changes to address the issue of using an arbitrary length target phrase, as you can see in the smart_tokenizer function. 编辑：我做了一些更改来解决使用任意长度目标短语的问题，正如您在smart_tokenizer函数中看到的smart_tokenizer 。 The key there is the nltk.tokenize.MWETokenizer class (for more info see: Python: Tokenizing with phrases ). 关键是nltk.tokenize.MWETokenizer类（有关更多信息，请参阅： Python：使用短语进行nltk.tokenize.MWETokenizer ）。 Hopefully this helps. 希望这会有所帮助。 As an aside, I would challenge the idea that spaCy is necessarily more elegant - at some point, someone has to write the code to get the work done. spaCy说spaCy ，我会挑战spaCy 必然更优雅的想法 - 在某些时候，有人必须编写代码才能完成工作。 This will either that will be the spaCy devs, or you as you roll your own solution. 这可能是spaCy开发人员，也可能是您推出自己的解决方案。 Their API is rather complicated so I'll leave that exercise to you. 他们的API相当复杂，所以我会把这个练习留给你。

任何有效的方法来找到周围的ADJ尊重python中的目标短语？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-11-20 21:48:38

任何有效的方法来找到周围的ADJ尊重python中的目标短语？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-11-20 21:48:38

解决方案1
2 已采纳 2018-11-20 21:48:38