简体   繁体   中英

What does this pattern mean?

I have a script for automating data analysis. Unfortunately I do not know the format of the input data file. I found this piece of code which is intended to match the format of the file to certain prerequisites before performing the analysis. Can you help with understanding what does the pattern mean?

private static final Pattern oldFileHeaderPattern = (newFileHeaderPattern = Pattern.compile("\\s*^\\s*(-1|0|1)\\s+(-1|0|1)\\s*$.*", 40)).compile("\\s*^\\s*(1|0)\\s*$.*", 40)

This line is a master class in how not to write Java. Only a true master could pack so many blunders into one line.

  1. Can we talk about initializing two constants on one line? Don't do that. Don't ever do that. Pattern.compile() is a static method. Chaining static method calls is insanity.

     private static final Pattern oldFileHeaderPattern = Pattern.compile("\\\\s*^\\\\s*(1|0)\\\\s*$.*", 40); private static final Pattern newFileHeaderPattern = Pattern.compile("\\\\s*^\\\\s*(-1|0|1)\\\\s+(-1|0|1)\\\\s*$.*", 40); 
  2. Hard coding the magic number 40 hurts my soul. You're supposed to OR together different named constants if you want multiple flags. Not write out the number.

     private static final Pattern oldFileHeaderPattern = Pattern.compile("\\\\s*^\\\\s*(1|0)\\\\s*$.*", Pattern.DOTALL | Pattern.MULTILINE); private static final Pattern newFileHeaderPattern = Pattern.compile("\\\\s*^\\\\s*(-1|0|1)\\\\s+(-1|0|1)\\\\s*$.*", Pattern.DOTALL | Pattern.MULTILINE); 
  3. Now let's talk about \\\\s*^ and $.* . Matching things before and after ^ and $ anchors is suspect. Normally you put these at the start and end of your regex to require the regex to match a full line and you call it a day.

    Using * means they can match zero characters so they don't actually change what's matched. Let's remove them and just use ^ and $ . That means we can get rid of DOTALL , too, since . is gone.

     private static final Pattern oldFileHeaderPattern = Pattern.compile("^\\\\s*(1|0)\\\\s*$", Pattern.MULTILINE); private static final Pattern newFileHeaderPattern = Pattern.compile("^\\\\s*(-1|0|1)\\\\s+(-1|0|1)\\\\s*$", Pattern.MULTILINE); 

The regexes don't look so bad now, do they? The first one looks for a line consisting of 1 or 0 with optional whitespace on either side. The second one looks for a line with two numbers, each being -1 , 0 , or 1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM