简体   繁体   English

任何有效的方法来找到周围的ADJ尊重python中的目标短语?

[英]Any efficient way to find surrounding ADJ respect to target phrase in python?

I am doing sentiment analysis on given documents, my goal is I want to find out the closest or surrounding adjective words respect to target phrase in my sentences. 我正在对给定的文件进行情感分析,我的目标是我想在我的句子中找出与目标短语相关的最接近或周围的形容词。 I do have an idea how to extract surrounding words respect to target phrases, but How do I find out relatively close or closest adjective or NNP or VBN or other POS tag respect to target phrase. 我确实知道如何提取与目标短语相关的周围词,但我如何找到相对接近或最接近的形容词或NNPVBN或其他POS标签方面的目标短语。

Here is the sketch idea of how I may get surrounding words to respect to my target phrase. 这是关于如何使周围的单词尊重我的目标短语的草图概念。

sentence_List= {"Obviously one of the most important features of any computer is the human interface.", "Good for everyday computing and web browsing.",
"My problem was with DELL Customer Service", "I play a lot of casual games online[comma] and the touchpad is very responsive"}

target_phraseList={"human interface","everyday computing","DELL Customer Service","touchpad"}

Note that my original dataset was given as dataframe where the list of the sentence and respective target phrases were given. 请注意,我的原始数据集是作为数据框给出的,其中给出了句子列表和相应的目标短语。 Here I just simulated data as follows: 这里我只是模拟数据如下:

import pandas as pd
df=pd.Series(sentence_List, target_phraseList)
df=pd.DataFrame(df)

Here I tokenize the sentence as follow: 在这里我将句子标记为如下:

from nltk.tokenize import word_tokenize
tokenized_sents = [word_tokenize(i) for i in sentence_List]
tokenized=[i for i in tokenized_sents]

then I try to find out surrounding words respect to my target phrases by using this loot at here . 然后我试着通过在这里使用这个战利品找出对我的目标短语的周围的话。 However, I want to find out relatively closer or closet adjective , or verbs or VBN respect to my target phrase. 但是,我想找出相对更接近或壁橱的adjective ,或verbsVBN尊重我的目标短语。 How can I make this happen? 我怎样才能做到这一点? Any idea to get this done? 有没有想过要做到这一点? Thanks 谢谢

Would something like the following work for you? 以下是适合您的工作吗? I recognize there are some tweaks that need to be made to make this fully useful (checking for upper/lower case; it will also return the word ahead in the sentence rather than the one behind if there is a tie) but hopefully it is useful enough to get you started: 我认识到需要进行一些调整以使其完全有用(检查大写/小写;它还将在句子中返回前面的单词而不是后面的单词,如果有一个平局)但希望它是有用的足以让你入门:

import nltk
from nltk.tokenize import MWETokenizer

def smart_tokenizer(sentence, target_phrase):
    """
    Tokenize a sentence using a full target phrase.
    """
    tokenizer = MWETokenizer()
    target_tuple = tuple(target_phrase.split())
    tokenizer.add_mwe(target_tuple)
    token_sentence = nltk.pos_tag(tokenizer.tokenize(sentence.split()))

    # The MWETokenizer puts underscores to replace spaces, for some reason
    # So just identify what the phrase has been converted to
    temp_phrase = target_phrase.replace(' ', '_')
    target_index = [i for i, y in enumerate(token_sentence) if y[0] == temp_phrase]
    if len(target_index) == 0:
        return None, None
    else:
        return token_sentence, target_index[0]


def search(text_tag, tokenized_sentence, target_index):
    """
    Search for a part of speech (POS) nearest a target phrase of interest.
    """
    for i, entry in enumerate(tokenized_sentence):
        # entry[0] is the word; entry[1] is the POS
        ahead = target_index + i
        behind = target_index - i
        try:
            if (tokenized_sentence[ahead][1]) == text_tag:
                return tokenized_sentence[ahead][0]
        except IndexError:
            try:
                if (tokenized_sentence[behind][1]) == text_tag:
                    return tokenized_sentence[behind][0]
            except IndexError:
                continue

x, i = smart_tokenizer(sentence='My problem was with DELL Customer Service',
                       target_phrase='DELL Customer Service')
print(search('NN', x, i))

y, j = smart_tokenizer(sentence="Good for everyday computing and web browsing.",
                       target_phrase="everyday computing")
print(search('NN', y, j))

Edit: I made some changes to address the issue of using an arbitrary length target phrase, as you can see in the smart_tokenizer function. 编辑:我做了一些更改来解决使用任意长度目标短语的问题,正如您在smart_tokenizer函数中看到的smart_tokenizer The key there is the nltk.tokenize.MWETokenizer class (for more info see: Python: Tokenizing with phrases ). 关键是nltk.tokenize.MWETokenizer类(有关更多信息,请参阅: Python:使用短语进行nltk.tokenize.MWETokenizer )。 Hopefully this helps. 希望这会有所帮助。 As an aside, I would challenge the idea that spaCy is necessarily more elegant - at some point, someone has to write the code to get the work done. spaCyspaCy ,我会挑战spaCy 必然更优雅的想法 - 在某些时候,有人必须编写代码才能完成工作。 This will either that will be the spaCy devs, or you as you roll your own solution. 这可能是spaCy开发人员,也可能是您推出自己的解决方案。 Their API is rather complicated so I'll leave that exercise to you. 他们的API相当复杂,所以我会把这个练习留给你。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法使用 python 找到标量值相对于向量的导数? - Is there any way to find the derivative of a scalar value with respect to a vector using python? 有什么方法可以在 python 上同时调用和可视化超过 Adj Close 列? - Is there any way to call and visualize more than the column Adj Close at the same time on python? 如何在 python 中找到与目标最匹配的对? - How to find the most-matching pair with respect to the target in python? 检查它是否是 Python 中任何组的子集的有效方法 - Efficient way to check if it is a subset of any group in Python 有什么有效的方法可以在 python 中编写此代码 - Is there any efficient way to write this code in python 是否有任何有效或高效的方法可以从 python 中的数据帧中找到数字的净 position - is there any effective or efficient way to find net position of numbers from a data frame in python Python / ElementTree:解析与周围文本相关的内联元素? - Python/ElementTree: Parsing inline elements w/ respect to surrounding text? 在Python列表中查找相似项目的有效方法 - Efficient way to find similar items in a list in Python 如何在Python中从给定的句子中找到预期的目标短语或关键字? - How can I find expected target phrase or keywords from given sentence in Python? Python-是否有更好/有效的方法在树中查找节点? - Python - Is there a better/efficient way to find a node in tree?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM