繁体   English   中英

如何在基于无监督方面的情感分析中进行主题检测

[英]How to do Topic Detection in Unsupervised Aspect Based Sentiment Analysis

我想使用 Python 制作一个 ABSA,其中从在线评论中分析预定义方面(例如交付、质量、服务)的情绪。 我想在无人监督的情况下进行,因为这将使我免于手动标记评论,并且我可以分析更多评论数据(查看大约 10 万条评论)。 因此,我的数据集仅包含评论,没有评分。 我想要一个 model 可以首先检测方面类别,然后分配情感极性。 例如,当评论说“装运顺利,但产品损坏”时,我希望 model 将“装运”一词分配给“交货”这一方面类别,而“顺利”与积极情绪相关。

我已经寻找了可以采取的方法,我想知道是否有人有这方面的经验并可以指导我找到一个可以帮助我的方向。 将不胜感激!

基于方面的情感分析 (ABSA),其中任务首先从给定文本中提取实体的方面或特征(即方面术语提取或 ATE1),然后确定针对每个文本的情感极性 (SP)(如果有)该实体的方面。 ABSA 的重要性导致了 ABSA 任务的创建

B-LSTM 和 CRF 分类器将用于有监督和无监督 ATE 的特征提取和方面术语检测。

在此处输入图像描述

https://www.researchgate.net/profile/Andreea_Hossmann/publication/319875533_Unsupervised_Aspect_Term_Extraction_with_B-LSTM_and_CRF_using_Automatically_Labelled_Datasets/links/5a3436a70f7e9b10d842b0eb/Unsupervised-Aspect-Term-Extraction-with-B-LSTM-and-CRF-using-Automatically-Labelled-Datasets. pdf

https://github.com/songyouwei/ABSA-PyTorch/blob/master/infer_example.py

# -*- coding: utf-8 -*-
# file: infer.py
# author: songyouwei <youwei0314@gmail.com>
# Copyright (C) 2019. All Rights Reserved.

import torch
import torch.nn.functional as F
import argparse

from data_utils import build_tokenizer, build_embedding_matrix
from models import IAN, MemNet, ATAE_LSTM, AOA


class Inferer:
    """A simple inference example"""
    def __init__(self, opt):
        self.opt = opt
        self.tokenizer = build_tokenizer(
            fnames=[opt.dataset_file['train'], opt.dataset_file['test']],
            max_seq_len=opt.max_seq_len,
            dat_fname='{0}_tokenizer.dat'.format(opt.dataset))
        embedding_matrix = build_embedding_matrix(
            word2idx=self.tokenizer.word2idx,
            embed_dim=opt.embed_dim,
            dat_fname='{0}_{1}_embedding_matrix.dat'.format(str(opt.embed_dim), opt.dataset))
        self.model = opt.model_class(embedding_matrix, opt)
        print('loading model {0} ...'.format(opt.model_name))
        self.model.load_state_dict(torch.load(opt.state_dict_path))
        self.model = self.model.to(opt.device)
        # switch model to evaluation mode
        self.model.eval()
        torch.autograd.set_grad_enabled(False)

    def evaluate(self, raw_texts):
        context_seqs = [self.tokenizer.text_to_sequence(raw_text.lower().strip()) for raw_text in raw_texts]
        aspect_seqs = [self.tokenizer.text_to_sequence('null')] * len(raw_texts)
        context_indices = torch.tensor(context_seqs, dtype=torch.int64).to(self.opt.device)
        aspect_indices = torch.tensor(aspect_seqs, dtype=torch.int64).to(self.opt.device)

        t_inputs = [context_indices, aspect_indices]
        t_outputs = self.model(t_inputs)

        t_probs = F.softmax(t_outputs, dim=-1).cpu().numpy()
        return t_probs


if __name__ == '__main__':
    model_classes = {
        'atae_lstm': ATAE_LSTM,
        'ian': IAN,
        'memnet': MemNet,
        'aoa': AOA,
    }
    # set your trained models here
    model_state_dict_paths = {
        'atae_lstm': 'state_dict/atae_lstm_restaurant_acc0.7786',
        'ian': 'state_dict/ian_restaurant_acc0.7911',
        'memnet': 'state_dict/memnet_restaurant_acc0.7911',
        'aoa': 'state_dict/aoa_restaurant_acc0.8063',
    }
    class Option(object): pass
    opt = Option()
    opt.model_name = 'ian'
    opt.model_class = model_classes[opt.model_name]
    opt.dataset = 'restaurant'
    opt.dataset_file = {
        'train': './datasets/semeval14/Restaurants_Train.xml.seg',
        'test': './datasets/semeval14/Restaurants_Test_Gold.xml.seg'
    }
    opt.state_dict_path = model_state_dict_paths[opt.model_name]
    opt.embed_dim = 300
    opt.hidden_dim = 300
    opt.max_seq_len = 80
    opt.polarities_dim = 3
    opt.hops = 3
    opt.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    inf = Inferer(opt)
    t_probs = inf.evaluate(['happy memory', 'the service is terrible', 'just normal food'])
    print(t_probs.argmax(axis=-1) - 1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM