简体   繁体   English

如何为语和表情符号建立正则表达式(regex)

[英]how to build a regular expression (regex) for slangs and emoticons

i need to build a regex to match slangs (ie lol, lmao, imo, etc..) and emoticons (ie :), :P, ;), etc...). 我需要建立一个正则表达式来匹配语(即大声笑,lmao,imo等。)和表情符号(即:) 、: P 、;)等)。

i followed the example at http://www.coderanch.com/t/497238/java/java/Regular-Expression-Detecting-Emoticons . 我按照http://www.coderanch.com/t/497238/java/java/Regular-Expression-Detecting-Emoticons中的示例进行操作。 however, this method/approach is failing for me. 但是,这种方法/方法对我来说是失败的。

for example, let's say i need to match the slang "od". 例如,假设我需要匹配语“ od”。 i create a Pattern as follows. 我创建一个模式如下。 Pattern pattern = Pattern.compile(Pattern.quote("od")); 模式模式= Pattern.compile(Pattern.quote(“ od”));

let's say i need to match the slang "od" in the following test sentence, "some methods are bad." 假设我需要在以下测试句子中匹配match语“ od”,“某些方法不好。” empirically, there is one match on the word "methods" in the string, which is not what i want. 根据经验,字符串中的“方法”一词有一个匹配项,这不是我想要的。

i did read some of the javadoc and some of the tutorial regarding java and regex, but i still can't figure this out. 我确实阅读了一些Javadoc和有关Java和regex的一些教程,但是我仍然无法弄清楚。

by the way, i am using Java 6 (though i've looked and reference the java 5 api doc). 顺便说一句,我正在使用Java 6(尽管我已经查看并引用了Java 5 API文档)。

if regex is not the best way to go, i am opened to other solutions as well. 如果正则表达式不是最好的方法,那么我也会接受其他解决方案。 thanks in advance for any help/pointers. 在此先感谢您的帮助/指标。 the following code gets me 3 matches and is based on the link above. 以下代码基于上面的链接为我提供了3个匹配项。

String regex = "od";
Pattern pattern = Pattern.compile(Pattern.quote(regex));
String str = "some methods are bad od od more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

the following code returns no matches and is based on the responses so far. 以下代码未返回任何匹配,并且基于到目前为止的响应。

String regex = "\bod\b";
Pattern pattern = Pattern.compile(regex);
//Pattern pattern = Pattern.compile(Pattern.quote(regex)); //this fails
String str = "some methods are bad od od more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

after the two helpful responses below, i will post the correct/desired code snippet here. 在以下两个有用的回复之后,我将在此处发布正确/所需的代码段。

String regex = "(\\bod\\b)|(\\blmao\\b)";
Pattern pattern = Pattern.compile(regex);
String str = "some methods are bad od od more text lmao more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

this code is correct or as desired because empirically, it gives me 3 matches (2 od and 1 lmao). 此代码正确或符合要求,因为根据经验,它可以给我3个匹配项(2 od和1 lmao)。 sorry, i wish i am stronger with regex using java (and just regex in general). 抱歉,我希望我使用Java来使用正则表达式更强(一般来说只使用正则表达式)。 thanks for your help. 谢谢你的帮助。

[:;]-?[DP()] [:] - [DP()]

handles the combinations of ":" or ":" plus either "-" and "D" or "P" or ")" or "(" 处理“:”或“:”加上“-”和“ D”或“ P”或“)”或“(”的组合
eg. 例如。 :P :-( ;D etc... :P :-(; D等...

just add more combinations... 只需添加更多组合...

have fun.. 玩得开心..

You can use word boundaries ( \\b ) in order to match a word that's just the slang you want. 您可以使用单词边界( \\b )来匹配只是您想要的语的单词。

So for example, the pattern "\\bod\\b" will match "od", but won't match "method". 因此,例如,模式"\\bod\\b"将匹配“ od”,但将不匹配“ method”。

Do you need to use a regex? 您需要使用正则表达式吗? I would do 我会做

String str = "some methods are bad od od more text lmao more text";
String[] words = str.Split(" ");
for (String s : words) {
  if (s.Equals("od") || s.Equals("lamo"))
    System.out.println(s);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM