[英]How to extract all words between certain special characters from a string which has no spaces?
I have a string which is a result fetched from a website of parsing a tweet content, here is the string: 我有一个字符串,该字符串是从网站上解析tweet内容获取的结果,这是字符串:
"1\\tI\\t_\\tPRP\\tPRP\\t_\\t2\\tnsubj\\t_\\t_\\n2\\tneed\\t_\\tVB\\tVBP\\t_\\t0\\tnull\\t_\\t_\\n3\\tmore\\t_\\tJJ\\tJJR\\t_\\t4\\tamod\\t_\\t_\\n4\\twords\\t_\\tNN\\tNNS\\t_\\t2\\tdobj\\t_\\t_\\n5\\tlike\\t_\\tIN\\tIN\\t_\\t4\\tprep\\t_\\t_\\n6\\tmarvel\\t_\\tNN\\tNN\\t_\\t5\\tpobj\\t_\\t_\\n7\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n8\\tcat\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n9\\tor\\t_\\tCC\\tCC\\t_\\t6\\tcc\\t_\\t_\\n10\\tpancake\\t_\\tNN\\tNN\\t_\\t6\\tconj\\t_\\t_\\n11\\tor\\t_\\tCC\\tCC\\t_\\t10\\tcc\\t_\\t_\\n12\\tfrance\\t_\\tNN\\tNN\\t_\\t10\\tconj\\t_\\t_", "text": "I need more words like marvel or cat or pancake or france" “1 \\ TI \\ T_ \\ TPRP \\ TPRP \\ T_ \\ T2 \\ tnsubj \\ T_ \\ T_ \\ N 2 \\ tneed \\ T_ \\ TVB \\收费电视控股\\ T_ \\ T0 \\ tnull \\ T_ \\ T_ \\ N3 \\ tmore \\ T_ \\ TJJ \\ tJJR \\ T_ \\ T4 \\ tamod \\ T_ \\ T_ \\ N4 \\ twords \\ T_ \\ TNN \\ tNNS \\ T_ \\ T2 \\ tdobj \\ T_ \\ T_ \\ N5 \\ tlike \\ T_ \\锡\\锡\\ T_ \\ T4 \\ tprep \\ T_ \\ T_ \\ N6 \\ tmarvel \\ T_ \\ TNN \\ TNN \\ T_ \\ T5 \\ tpobj \\ T_ \\ T_ \\ N7 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ n8 \\ TCAT \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ N9 \\ TOR \\ T_ \\ TCC \\ TCC \\ T_ \\ T6 \\ TCC \\ T_ \\ T_ \\ N10 \\ tpancake \\ T_ \\ TNN \\ TNN \\ T_ \\ T6 \\ tconj \\ T_ \\ T_ \\ n11 \\ tor \\ t_ \\ tCC \\ tCC \\ t_ \\ t10 \\ tcc \\ t_ \\ t_ \\ n12 \\ tfrance \\ t_ \\ tNN \\ tNN \\ t_ \\ t10 \\ tconj \\ t_ \\ t_“,” text“:”我需要更多诸如奇迹,猫,煎饼或法国之类的词”
I want to get all the words who are between "\\t" and "\\t_\\tNN", in other words I want the nouns, I wanted the output to be "words", "marvel", "cat", "pancake", "france". 我想获取介于“ \\ t”和“ \\ t_ \\ tNN”之间的所有单词,换句话说,我想要名词,我希望输出是“ words”,“ marvel”,“ cat”,“ pancake” ”,“法国”。
I tried the code below: 我尝试了下面的代码:
private void regex(String s){
if(s.indexOf("error") >= 1){
Toast.makeText(this, "Sorry the site failed again it's not my fault :(",
Toast.LENGTH_SHORT).show();
}
else{
Pattern pattern = Pattern.compile("\t(.*?)\t_\tNN");
Matcher matcher = pattern.matcher(s);
System.out.println(s);
if (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
}
}
I am sure I got the pattern.compile string wrong.. it's not working seems it can't find the words I wanted.. 我确定我得到了pattern.compile字符串错误..似乎找不到我想要的单词,这是行不通的。
Could anybody tell me how should I fix it? 谁能告诉我该如何解决?
PS About the tab character lookalike "/t", I actually printed the whole website as result, but when I get the result as a string I guess they become just a backslash and a "t" instead of still being tab characters. PS关于制表符看起来像“ / t”,实际上我将整个网站打印为结果,但是当我将结果作为字符串获得时,我想它们只是反斜杠和“ t”,而不是制表符。
You can use the following: 您可以使用以下内容:
"\\\\t([^\\\\]*?)\\\\t_\\\\tNN"
See Ideone Demo 参见Ideone演示
See RegEx Demo 参见RegEx演示
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.