如何从没有空格的字符串中提取某些特殊字符之间的所有单词？

Question

I have a string which is a result fetched from a website of parsing a tweet content, here is the string: 我有一个字符串，该字符串是从网站上解析tweet内容获取的结果，这是字符串：

"1\\tI\\t_\\tPRP\\tPRP\\t_\\t2\\tnsubj\\t_\\t_\\n2\\tneed\\t_\\tVB\\tVBP\\t_\\t0\\tnull\\t_\\t_\\n3\\tmore\\t_\\tJJ\\tJJR\\t_\\t4\\tamod\\t_\\t_\\n4\\twords\\t_\\tNN\\tNNS\\t_\\t2\\tdobj\\t_\\t_\\n5\\tlike\\t_\\tIN\\tIN\\t_\\t4\\tprep\\t_\\t_\\n6\\tmarvel\\t_\\tNN\\tNN\\t_\\t5\\tpobj\\t_\\t_\\n7\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n8\\tcat\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n9\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n10\\tpancake\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n11\\tor\\t_\\tCC\\tCC\\t_\\t10\\tcc\\t_\\t_\\n12\\tfrance\\t_\\tNN\\tNN\\t_\\t10\\tconj\\t_\\t_", "text": "I need more words like marvel or cat or pancake or france" “1 \\ TI \\ T_ \\ TPRP \\ TPRP \\ T_ \\ T2 \\ tnsubj \\ T_ \\ T_ \\ N 2 \\ tneed \\ T_ \\ TVB \\收费电视控股\\ T_ \\ T0 \\ tnull \\ T_ \\ T_ \\ N3 \\ tmore \\ T_ \\ TJJ \\ tJJR \\ T_ \\ T4 \\ tamod \\ T_ \\ T_ \\ N4 \\ twords \\ T_ \\ TNN \\ tNNS \\ T_ \\ T2 \\ tdobj \\ T_ \\ T_ \\ N5 \\ tlike \\ T_ \\锡\\锡\\ T_ \\ T4 \\ tprep \\ T_ \\ T_ \\ N6 \\ tmarvel \\ T_ \\ TNN \\ TNN \\ T_ \\ T5 \\ tpobj \\ T_ \\ T_ \\ N7 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ n8 \\ TCAT \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ N9 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ N10 \\ tpancake \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ n11 \\ tor \\ t_ \\ tCC \\ tCC \\ t_ \\ t10 \\ tcc \\ t_ \\ t_ \\ n12 \\ tfrance \\ t_ \\ tNN \\ tNN \\ t_ \\ t10 \\ tconj \\ t_ \\ t_“，” text“：”我需要更多诸如奇迹，猫，煎饼或法国之类的词”

I want to get all the words who are between "\\t" and "\\t_\\tNN", in other words I want the nouns, I wanted the output to be "words", "marvel", "cat", "pancake", "france". 我想获取介于“ \\ t”和“ \\ t_ \\ tNN”之间的所有单词，换句话说，我想要名词，我希望输出是“ words”，“ marvel”，“ cat”，“ pancake” ”，“法国”。

I tried the code below: 我尝试了下面的代码：

private void regex(String s){
        if(s.indexOf("error") >= 1){
            Toast.makeText(this, "Sorry the site failed again it's not my fault :(",
                       Toast.LENGTH_SHORT).show();
        }
        else{
            Pattern pattern = Pattern.compile("\t(.*?)\t_\tNN");
            Matcher matcher = pattern.matcher(s);
            System.out.println(s);
            if (matcher.find()) {
                String result = matcher.group(1);
                System.out.println(result);
            }
        }

    }

I am sure I got the pattern.compile string wrong.. it's not working seems it can't find the words I wanted.. 我确定我得到了pattern.compile字符串错误..似乎找不到我想要的单词，这是行不通的。

Could anybody tell me how should I fix it? 谁能告诉我该如何解决？

PS About the tab character lookalike "/t", I actually printed the whole website as result, but when I get the result as a string I guess they become just a backslash and a "t" instead of still being tab characters. PS关于制表符看起来像“ / t”，实际上我将整个网站打印为结果，但是当我将结果作为字符串获得时，我想它们只是反斜杠和“ t”，而不是制表符。

Answer 1

You can use the following: 您可以使用以下内容：

"\\\\t([^\\\\]*?)\\\\t_\\\\tNN"

See Ideone Demo 参见Ideone演示

See RegEx Demo 参见RegEx演示

如何从没有空格的字符串中提取某些特殊字符之间的所有单词？

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-08 21:10:49

如何从没有空格的字符串中提取某些特殊字符之间的所有单词？

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-08 21:10:49

解决方案1
1 已采纳 2015-06-08 21:10:49