简体   繁体   English

使用模式匹配从文件中排序,Java

[英]Using pattern matching to sort from a file, Java

So I've gotten my program to the point where it properly separates the lines of the text file properly and can even match the pattern for the first line of text but i also need to be able to detect and separate the address lines of the text file and sort them based on their direction or street/broadway but i cant even get the initial pattern to be detected for the address setup.所以我已经让我的程序正确地分隔文本文件的行,甚至可以匹配第一行文本的模式,但我还需要能够检测和分隔文本的地址行文件并根据他们的方向或街道/百老汇对它们进行排序,但我什至无法为地址设置检测到初始模式。 Am i using regex wrong and is that why the address portion wont be detected properly?我是否使用正则表达式错误,这就是地址部分无法正确检测的原因吗?

CODE代码

package csi311;

// Import some standard Java libraries.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
 * Hello world example.  Shows passing in command line arguments, in this case a filename. 
 * If the filename is given, read in the file and echo it to stdout.
 */
public class HelloCsi311 {

    /**
     * Class construtor.
     */
    public HelloCsi311() {
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    public void run(String filename) throws Exception {
        if (filename != null) {
            readFile(filename); 
        }
    }


    /**
     * @param filename the name of a file to read in 
     * @throws Exception on anything bad happening 
     */
    private void readFile(String filename) throws Exception {
        System.out.println("Dumping file " + filename); 
        // Open the file and connect it to a buffered reader.
        BufferedReader br = new BufferedReader(new FileReader(filename));  
        ArrayList<String> foundaddr = new ArrayList<String>();
        String line = null;  
        String pattern = "^\\d\\d\\d-[A-Za-z][A-Za-z][A-Za-z]-\\d\\d\\d\\d";
        String address[] = new String[4];
        address[0] = "\\d{1,3}\\s\\[A-Za-z]{1,20}";
        address[1] = "\\d{1,3}\\s\\[A-Za-z]{1,20}\\s\\d{1,3}\\[A-Za-z]{1,20}\\s\\[A-Za-z]{1,20}";
        address[2] = "\\d{1,3}\\s\\d{1,3}\\[A-Za-z]{1,20}\\s\\[A-Za-z]{1,20}";
        address[3] = "\\d\\d\\s\\[A-Za-z]{1,20}";
        Pattern r = Pattern.compile(pattern);
        // Get lines from the file one at a time until there are no more.
        while ((line = br.readLine()) != null) {
            if(line.trim().isEmpty()) {
                continue;
            }
            String sample = line.replaceAll("\\s+,", ",").replaceAll(",+\\s",",");
            String[] result = sample.split(",");
            String pkgId = result[0].trim().toUpperCase();
            String pkgAddr = result[1].trim();


            Float f = Float.valueOf(result[2]);
            for(String str : result){
                // Trying to match for different types
                for(String pat : address){
                    if(str.matches(pat)){
                        System.out.println(pat);
                    }
                }



                if(f < 50 && !pkgId.matches(pattern)) {
                    Matcher m = r.matcher(str);
                    if(m.find()) {
                        foundaddr.add(str);
                    }
                }
            }
        }

        if(foundaddr != null) {
            System.out.println(foundaddr.size());
        }   

        // Close the buffer and the underlying file.
        br.close();
    }



    /**
     * @param args filename
     */
    public static void main(String[] args) {
        // Make an instance of the class.
        HelloCsi311 theApp = new HelloCsi311();
        String filename = null; 
        // If a command line argument was given, use it as the filename.
        if (args.length > 0) {
            filename = args[0]; 
        }
        try { 
            // Run the run(), passing in the filename, null if not specified.
            theApp.run(filename);
        }
        catch (Exception e) {
            // If anything bad happens, report it.
            System.out.println("Something bad happened!");
            e.printStackTrace();
        }    
    }
}

Text File文本文件

123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
6785-FGH-9845 ,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2

123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24

Below is the lines of code that should be able to process the address lines but for some reason it wont match the pattern and the outputs which properly separate the address lines and can be seen in the print statement above the for loop dealing with the addresses but for some reason the address lines arent even being detected as matches and im confused as to why that is.下面是应该能够处理地址行的代码行,但由于某种原因,它不会匹配模式和正确分隔地址行的输出,并且可以在处理地址的 for 循环上方的打印语句中看到,但是出于某种原因,地址线甚至没有被检测为匹配项,我很困惑为什么会这样。

Line of Code Issue is with代码行问题是

  for(String str : result){
      //System.out.println(str);
      // Trying to match for different types
      for(String pat : address){
          if(str.matches(pat)){
              System.out.println(pat);
          }
      }

Desired Output - Edit as Requested -所需的输出- 按要求编辑 -

22 Broadway
45 5th Ave
101 B'way

I believe the problem is with your Regex.我相信问题出在您的正则表达式上。 \\\\d\\\\d\\\\s\\\\[A-Za-z]{1,20} for example, after all of the escaping becomes \\d\\d\\s\\[A-Za-z]{1,20} . \\\\d\\\\d\\\\s\\\\[A-Za-z]{1,20}例如,在所有转义之后变成\\d\\d\\s\\[A-Za-z]{1,20} . This breaks down as follows:这分解如下:

  • \\d : Match any digit \\d : 匹配任何数字
  • \\d : Match any digit \\d : 匹配任何数字
  • \\s : Match any whitespace character \\s :匹配任何空白字符
  • \\[ : Match the [ character \\[ : 匹配[字符
  • A-Za-z : Match the literal text A-Za-z A-Za-z :匹配文字文本A-Za-z
  • ] : Match the literal character ] ] : 匹配文字字符]
    • {1,20} : Match the preceding character ( ] ) 1-20 times. {1,20} :匹配前面的字符 ( ] ) 1-20 次。

The regex you probably want is \\d\\d\\s[A-Za-z]{1,20} which, as an escaped string is \\\\d\\\\d\\\\s[A-Za-z]{1,20} .您可能想要的正则表达式是\\d\\d\\s[A-Za-z]{1,20} ,作为转义字符串是\\\\d\\\\d\\\\s[A-Za-z]{1,20} Notice that there's no \\ before the [ .请注意, [之前没有\\

Something else to keep in mind is that regular expressions can match anywhere in the string.要记住的另一件事是正则表达式可以匹配字符串中的任何位置。 For example the regex a would match the string a but would also match abc , bac , abracadabra , etc. To avoid this, you must use the anchoring symbols ^ and $ to match the start and end respectively.例如,正则表达式a将匹配字符串a但也会匹配abcbacabracadabra等。为避免这种情况,您必须使用锚定符号^$分别匹配开始和结束。 Your regex then becomes ^\\\\d\\\\d\\\\s[A-Za-z]{1,20}$ .然后您的正则表达式变为^\\\\d\\\\d\\\\s[A-Za-z]{1,20}$

I also noticed that you're matching each column against the regex using with the for loop for(String str : result){ .我还注意到您使用 for 循环for(String str : result){将每一列与正则表达式匹配。 It seems to me that you should only be matching against result[1] or pkgAddr .在我看来,您应该只匹配result[1]pkgAddr

A final note, take a look at Regex 101 .最后一点,看看Regex 101 It will allow you to test your regular expressions against a bunch of inputs to see if they match.它将允许您针对一堆输入测试您的正则表达式,以查看它们是否匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM