简体   繁体   中英

How would you translate this JavaScript regex to Java?

How would you translate this JavaScript regex to Java?

It removes punctuation from a string:

strippedStr = str.replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()]/g,"");

Don't need the s/// stuff in there, you just pass the regular expression, and here it's a character class.

public static void main(String [] args)
{
    String s = ".,/#!$%^&*;:{}=-_`~()./#hello#&%---#(($";
    String n = s.replaceAll("[-.,/#!$%^&*;:{}=_`~()]", "");
    System.out.println(n); // should print "hello"

    // using POSIX character class which matches any of:
    // !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
    String p = s.replaceAll("\\p{Punct}", "");
    System.out.println(p);
}

Syntax for Java Regular Expressions here: http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum

The POSIX character class used above covers a little more than you have, so I'm not sure if it fits your needs.

  • edited because you don't need to escape . in the character class.
  • edited to move - to beginning of character class so it has no special meaning

If you expect this to work with all punctuation instead of just ASCII, you need to use:

String new_string = old_string.replaceAll("[\\pS\\pP]", "");

That's because some of the things you are calling punctuation are actually symbols, as revealed by my uniprops script :

$ uniprops - \\ . , / '#' ! '$' % ^ '&' '*' ';' : { } = _ '`' '~' '(' ')'
   U+002D ‹-› \N{ HYPHEN-MINUS }:
    \pP \p{Pd}
    All Any ASCII Assigned Common Zyyy Dash Dash_Punctuation Pd P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+005C ‹\› \N{ REVERSE SOLIDUS }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002E ‹.› \N{ FULL STOP }:
    \pP \p{Po}
    All Any ASCII Assigned Case_Ignorable CI Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation STerm Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002C ‹,› \N{ COMMA }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002F ‹/› \N{ SOLIDUS }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0023 ‹#› \N{ NUMBER SIGN }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0021 ‹!› \N{ EXCLAMATION MARK }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation STerm Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0024 ‹$› \N{ DOLLAR SIGN }:
    \pS \p{Sc}
    All Any ASCII Assigned Common Zyyy Currency_Symbol Sc S Gr_Base Grapheme_Base Graph GrBase Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0025 ‹%› \N{ PERCENT SIGN }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+005E ‹^› \N{ CIRCUMFLEX ACCENT }:
    \pS \p{Sk}
    All Any ASCII Assigned Case_Ignorable CI Common Zyyy Dia Diacritic Sk S Gr_Base Grapheme_Base Graph GrBase Math Modifier_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0026 ‹&› \N{ AMPERSAND }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002A ‹*› \N{ ASTERISK }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003B ‹;› \N{ SEMICOLON }:
    \pP \p{Po}
    All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003A ‹:› \N{ COLON }:
    \pP \p{Po}
    All Any ASCII Assigned Case_Ignorable CI Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+007B ‹{› \N{ LEFT CURLY BRACKET }:
    \pP \p{Ps}
    All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Common Zyyy Ps P Gr_Base Grapheme_Base Graph GrBase Open_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+007D ‹}› \N{ RIGHT CURLY BRACKET }:
    \pP \p{Pe}
    All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Close_Punctuation Pe Common Zyyy P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003D ‹=› \N{ EQUALS SIGN }:
    \pS \p{Sm}
    All Any ASCII Assigned Common Zyyy Sm S Gr_Base Grapheme_Base Graph GrBase Math Math_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+005F ‹_› \N{ LOW LINE }:
    \w \pP \p{Pc}
    All Any ASCII Assigned Common Zyyy Connector_Punctuation Pc P Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC Punct PerlWord PosixGraph PosixPrint PosixPunct PosixWord Print Punctuation Word XID_Continue XIDC XPosixGraph XPosixPrint XPosixPunct XPosixWord
U+0060 ‹`› \N{ GRAVE ACCENT }:
    \pS \p{Sk}
    All Any ASCII Assigned Case_Ignorable CI Common Zyyy Dia Diacritic Sk S Gr_Base Grapheme_Base Graph GrBase Modifier_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+007E ‹~› \N{ TILDE }:
    \pS \p{Sm}
    All Any ASCII Assigned Common Zyyy Sm S Gr_Base Grapheme_Base Graph GrBase Math Math_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0028 ‹(› \N{ LEFT PARENTHESIS }:
    \pP \p{Ps}
    All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Common Zyyy Ps P Gr_Base Grapheme_Base Graph GrBase Open_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0029 ‹)› \N{ RIGHT PARENTHESIS }:
    \pP \p{Pe}
    All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Close_Punctuation Pe Common Zyyy P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct

Not a Java guy but:

public static final String expression = "[\\s\\p{Punct}]";

Source: C# equivalent of Java Punctuation regex

The g at the end of the regexp means it's global; the equivalent Java String method is replaceAll() (which takes a search regexp and a replacement string). The only thing you need to do to the regexp itself is escape the \\ s, since the Java parser will interpret things like \\. as you trying to escape a . :

String strippedStr = str.replaceAll("[\\.,-\\/#!$%\\^&\\*;:{}=\\-_`~()]", "");

这个角色类不会以更简洁的方式完成工作吗?

str.replaceAll("[\\W_]", "");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM