简体   繁体   English

如何将POS标记器与SentiWordNet算法集成

[英]How To integrate POS tagger with SentiWordNet algorithm

Here is my SentiWorNet Algo:



public class SWN3 {

private String pathToSWN = "C:/Users/RAHUL/Desktop/SWN/SentiWordNet_3.0.0.txt";
        private HashMap<String, Double>_dict;

        public SWN3(){

            _dict = new HashMap<String, Double>();
            HashMap<String, Vector<Double>> _temp = new HashMap<String, Vector<Double>>();
            try{
                BufferedReader csv =  new BufferedReader(new FileReader(pathToSWN));
                String line = "";           
                while((line = csv.readLine()) != null)
                {
                    String[] data = line.split("\t");
                    Double score = Double.parseDouble(data[2])-Double.parseDouble(data[3]);
                    String[] words = data[4].split(" ");
                    for(String w:words)
                    {
                        String[] w_n = w.split("#");
                        w_n[0] += "#"+data[0];
                        int index = Integer.parseInt(w_n[1])-1;
                        if(_temp.containsKey(w_n[0]))
                        {
                            Vector<Double> v = _temp.get(w_n[0]);
                            if(index>v.size())
                                for(int i = v.size();i<index; i++)
                                    v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                        else
                        {
                            Vector<Double> v = new Vector<Double>();
                            for(int i = 0;i<index; i++)
                                v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                    }
                }
                Set<String> temp = _temp.keySet();
                for (Iterator<String> iterator = temp.iterator(); iterator.hasNext();) {
                    String word = iterator.next();
                    Vector<Double> v = _temp.get(word);
                    double score = 0.0;
                    double sum = 0.0;
                    for(int i = 0; i < v.size(); i++)
                        score += ((double)1/(double)(i+1))*v.get(i);
                    for(int i = 1; i<=v.size(); i++)
                        sum += (double)1/(double)i;
                    score /= sum;
                    String sent = "";               
                    if(score>=0.75)
                        sent = "strong_positive";
                    else
                    if(score > 0.50 && score<0.75)
                        sent = "moderately_positive";
                    else
                        if(score > 0.25 && score>=0.50)
                            sent = "positive";
                    else
                    if(score > 0 && score>=0.25)
                        sent = "weak_positive";
                    else
                    if(score < 0 && score>=-0.25)
                        sent = "weak_negative";
                    else
                    if(score < -0.25 && score>=-0.5)
                        sent = "negative";
                    else
                        if(score < -0.50 && score>-0.75)
                            sent = "moderately_negative";
                    else
                    if(score<=-0.75)
                        sent = "strong_negative";
                    _dict.put(word, score);
                }
            }
            catch(Exception e){e.printStackTrace();}        
        }

public Double extract(String word)
{
   Double total = new Double(0);
    if(_dict.get(word+"#n") != null)
         total = _dict.get(word+"#n") + total;
    if(_dict.get(word+"#a") != null)
        total = _dict.get(word+"#a") + total;
    if(_dict.get(word+"#r") != null)
        total = _dict.get(word+"#r") + total;
    if(_dict.get(word+"#v") != null)
        total = _dict.get(word+"#v") + total;
    return total;
}



public static String SentiWord(String stri) {
    SWN3 test = new SWN3();
    String sentence=stri;
    String[] words = sentence.split("\\s+"); 
    double totalScore = 0;
    for(String word : words) {
        word = word.replaceAll("([^a-zA-Z\\s])", "");
        if (test.extract(word) == null)
            continue;
        totalScore += test.extract(word);
    }

    String sent = "";               
    if(totalScore>=0.75)
        sent = "strong_positive";
    else
    if(totalScore > 0.25 && totalScore<0.75)
        sent = "positive";
   ....
   ....

    return sent;
}

}

And here is my Pos Tagger method: 这是我的Pos Tagger方法:

public class TagText {
public static void main(String[] args) throws IOException,
ClassNotFoundException {

// Initialize the tagger
MaxentTagger tagger = new MaxentTagger("taggers/english-left3words-distsim.tagger");

// The sample string
String sample = "This is a sample text";

// The tagged string
String tagged = tagger.tagString(sample);

//output the tagged sample string onto your console
System.out.println("Input: " + sample);
System.out.println("Output: "+ tagged);
}
}

I need POS Tagger to be integrated with SentiwordNet.I am trying to make a system for Sentimental analysis. 我需要将POS Tagger与SentiwordNet集成在一起。我正在尝试构建一个用于情感分析的系统。 Right now this SentiwordNet code is working fine without pos tagging but not giving good results. 目前,此SentiwordNet代码在没有pos标记的情况下仍可以正常工作,但效果不佳。 I just cannot figure it out. 我只是想不通。 Please help. 请帮忙。

You could adapt your extract method in SWN3 like this: 您可以像这样在SWN3修改extract方法:

public Double extract(String word, String tail) {
    if (tail.contains("NN") || tail.contains("NNS")
            || tail.contains("NNP")
            || tail.contains("NNPS"))
        return _dict.get(word + "#n");
    else if (tail.contains("VB") || tail.contains("VBD")
            || tail.contains("VBG") || tail.contains("VBN")
            || tail.contains("VBP") || tail.contains("VBZ")) 
        return _dict.get(word + "#v"); 
    else if (tail.contains("JJ") || tail.contains("JJR")
            || tail.contains("JJS"))
        return _dict.get(word + "#a");
    else if (tail.contains("RB") || tail.contains("RBR")
            || tail.contains("RBS"))
        return _dict.get(word + "#r");
    else
        return null;
}

It maps the tags with the types of words as defined in SentiWordNet . 它使用SentiWordNet定义的单词类型映射标签 I suggest to change your main method like this: 我建议像这样更改您的主要方法:

public static void main(String[] args) {
    MaxentTagger tagger = new MaxentTagger("files/english-left3words-distsim.tagger");

    //String sample = "This is a sample text";
    String sample = "It works much better with this great example!";
    sample = sample.replaceAll("([^a-zA-Z\\s])", "");
    String[] words = sample.split("\\s+");

    String taggedSample = tagger.tagString(sample);
    String[] taggedWords = taggedSample.split("\\s+");
    System.out.println(tagger.tagString(sample));

    double totalScore = 0;
    SWN3 test = new SWN3();
    System.out.println("-----------");
    for (int i=0; i<taggedWords.length;i++) {
        String tail = taggedWords[i].substring(words[i].length() + 1);
        Double score = null;
        if(tail!=null{
            score = test.extract(words[i], tail);
            System.out.println(taggedWords[i] + "\t" + words[i] + "\t" + tail + "\t" + score);
        }
        if (score == null)
            continue;
        totalScore += score;
    }
    System.out.println("-----------");
    System.out.println(totalScore);
}

I used another sentence in sample where it works better. 我在sample中使用了另外一句话,效果更好。 Note that tagging the sentence and tagging words individually can lead to different results. 请注意,分别标记句子和单词会导致不同的结果。

I hope it helps. 希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM