简体   繁体   English

如何从没有空格的字符串中提取某些特殊字符之间的所有单词?

[英]How to extract all words between certain special characters from a string which has no spaces?

I have a string which is a result fetched from a website of parsing a tweet content, here is the string: 我有一个字符串,该字符串是从网站上解析tweet内容获取的结果,这是字符串:

"1\\tI\\t_\\tPRP\\tPRP\\t_\\t2\\tnsubj\\t_\\t_\\n2\\tneed\\t_\\tVB\\tVBP\\t_\\t0\\tnull\\t_\\t_\\n3\\tmore\\t_\\tJJ\\tJJR\\t_\\t4\\tamod\\t_\\t_\\n4\\twords\\t_\\tNN\\tNNS\\t_\\t2\\tdobj\\t_\\t_\\n5\\tlike\\t_\\tIN\\tIN\\t_\\t4\\tprep\\t_\\t_\\n6\\tmarvel\\t_\\tNN\\tNN\\t_\\t5\\tpobj\\t_\\t_\\n7\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n8\\tcat\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n9\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n10\\tpancake\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n11\\tor\\t_\\tCC\\tCC\\t_\\t10\\tcc\\t_\\t_\\n12\\tfrance\\t_\\tNN\\tNN\\t_\\t10\\tconj\\t_\\t_", "text": "I need more words like marvel or cat or pancake or france" “1 \\ TI \\ T_ \\ TPRP \\ TPRP \\ T_ \\ T2 \\ tnsubj \\ T_ \\ T_ \\ N 2 \\ tneed \\ T_ \\ TVB \\收费电视控股\\ T_ \\ T0 \\ tnull \\ T_ \\ T_ \\ N3 \\ tmore \\ T_ \\ TJJ \\ tJJR \\ T_ \\ T4 \\ tamod \\ T_ \\ T_ \\ N4 \\ twords \\ T_ \\ TNN \\ tNNS \\ T_ \\ T2 \\ tdobj \\ T_ \\ T_ \\ N5 \\ tlike \\ T_ \\锡\\锡\\ T_ \\ T4 \\ tprep \\ T_ \\ T_ \\ N6 \\ tmarvel \\ T_ \\ TNN \\ TNN \\ T_ \\ T5 \\ tpobj \\ T_ \\ T_ \\ N7 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ n8 \\ TCAT \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ N9 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ N10 \\ tpancake \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ n11 \\ tor \\ t_ \\ tCC \\ tCC \\ t_ \\ t10 \\ tcc \\ t_ \\ t_ \\ n12 \\ tfrance \\ t_ \\ tNN \\ tNN \\ t_ \\ t10 \\ tconj \\ t_ \\ t_“,” text“:”我需要更多诸如奇迹,猫,煎饼或法国之类的词”

I want to get all the words who are between "\\t" and "\\t_\\tNN", in other words I want the nouns, I wanted the output to be "words", "marvel", "cat", "pancake", "france". 我想获取介于“ \\ t”和“ \\ t_ \\ tNN”之间的所有单词,换句话说,我想要名词,我希望输出是“ words”,“ marvel”,“ cat”,“ pancake” ”,“法国”。

I tried the code below: 我尝试了下面的代码:

private void regex(String s){
        if(s.indexOf("error") >= 1){
            Toast.makeText(this, "Sorry the site failed again it's not my fault :(",
                       Toast.LENGTH_SHORT).show();
        }
        else{
            Pattern pattern = Pattern.compile("\t(.*?)\t_\tNN");
            Matcher matcher = pattern.matcher(s);
            System.out.println(s);
            if (matcher.find()) {
                String result = matcher.group(1);
                System.out.println(result);
            }
        }

    }

I am sure I got the pattern.compile string wrong.. it's not working seems it can't find the words I wanted.. 我确定我得到了pattern.compile字符串错误..似乎找不到我想要的单词,这是行不通的。

Could anybody tell me how should I fix it? 谁能告诉我该如何解决?

PS About the tab character lookalike "/t", I actually printed the whole website as result, but when I get the result as a string I guess they become just a backslash and a "t" instead of still being tab characters. PS关于制表符看起来像“ / t”,实际上我将整个网站打印为结果,但是当我将结果作为字符串获得时,我想它们只是反斜杠和“ t”,而不是制表符。

You can use the following: 您可以使用以下内容:

"\\\\t([^\\\\]*?)\\\\t_\\\\tNN"

See Ideone Demo 参见Ideone演示

See RegEx Demo 参见RegEx演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从字符串中删除空格、数字和特殊字符 - Removing spaces, numbers and special characters from a string 从字符串中删除空格和特殊字符 - Remove spaces and special characters from string 如何忽略字符串中的特殊字符和空格? - How to ignore special characters and spaces in string? 如何计算单词之间空格为空格的字符串中的确切单词数? - how to count the exact number of words in a string that has empty spaces between words? 在具有换行符的字符串中的两个单词之间提取 - Extract between two words in a string that has newlines 删除字符串中的空格和特殊字符(仅限数字之间) - Remove Spaces and Special Characters (between Numbers only) in a string 如何删除java中两个单词之间除下划线外的所有特殊字符? - How to remove all special characters except for underscore between two words in java? 如何从字符串中删除除-以外的所有特殊字符。 和空间 - How to remove all special Characters from a string except - . and space 如何从Java中的字符串中删除所有特殊字符? - How to remove all special characters from a string in java? 如何获取以某个字符开头的字符串的字母,避免使用其他特殊字符 - How to get the letters of a string which starts with a certain character avoiding other special characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM