简体   繁体   English

正则表达式中下划线字符背后的含义

[英]Meaning behind underscore character in regular expressions

Note: My program works, I am just looking for an explanation as to why. 注意:我的程序有效,我只是在寻找解释原因。

I have a java program that reads a file, counts the words in said file, and outputs the words and counts to another file. 我有一个java程序读取文件,计算所述文件中的单词,并输出单词和计数到另一个文件。 In the first part of my I use a regular expression to replace any character not a letter and replace it with an empty string. 在我的第一部分,我使用正则表达式替换任何字符而不是字母,并用空字符串替换它。

freq.add(in.next().replaceAll("[^A-Za-z]", ""));

This however does not account for hyphenated words so I changed the regEx to: 然而,这不会考虑带连字符的单词,因此我将regEx更改为:

freq.add(in.next().replaceAll("[^A-Za-z_-]", ""));

My question is, why does adding the underscore and hyphen work? 我的问题是,为什么添加下划线和连字符? What is the meaning behind the underscore character? 下划线角色背后的含义是什么?

While I'm asking questions, are regex the same for all languages? 虽然我在问问题,所有语言的正则表达式是否相同?

Also, if this is answered somewhere else I apologize, I did numerous searches with no luck. 另外,如果在其他地方回答我道歉,我做了很多搜索而没有运气。

There's nothing special about an underscore in a regular expression, it's just a normal character like A. A hyphen at the end of a character class isn't special either, although it is when between two other characters, as you've used it to match all letters by saying AZ for example. 正则表达式中的下划线没有什么特别之处,它只是像A这样的普通字符。字符类末尾的连字符也不特别,虽然它是在两个其他字符之间时,因为你已经用它来例如,通过说AZ来匹配所有字母。

Regular expressions are similar between most languages, but some of the more esoteric features can be different or missing from a language. 正则表达式在大多数语言之间是相似的 ,但某些更深奥的功能可能与语言不同或缺失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM