简体   繁体   English

反转基于正则表达式的解析器

[英]Reversing a regex based parser

I inherited a bank interface parser. 我继承了一个银行接口解析器。 The previous developer actually did this pretty slick. 以前的开发人员实际上做到了这一点。 The file that comes in from the bank is a fixed length field. 来自库的文件是固定长度字段。 The way he parses that record from the download is this 他从下载中解析记录的方式是这样的

    public static final String HEADER_RECORD_REGEX = "^(\\d{3})(\\d{12})(.{20})(\\d\\d)(\\d\\d)(\\d\\d)(\\d{12})(\\d\\d)$";

private static final int BANK_ID      = 1;
    private static final int ACCOUNT_ID   = 2;
    private static final int COMPANY_NAME = 3;
    private static final int MONTH              = 4;
    private static final int DAY                    = 5;
    private static final int YEAR                 = 6;
    private static final int SEQUENCE     = 7;
    private static final int TYPE_CODE      = 8;
    private static final int GROUP_COUNT  = TYPE_CODE;

if ( GROUP_COUNT == matcher.groupCount() )  {
            setBankId( matcher.group( BANK_ID ) );
            setAccountId( matcher.group( ACCOUNT_ID ) );
            setCompanyName( matcher.group( COMPANY_NAME ) );
            setProcessDate( matcher.group( MONTH ), matcher.group( DAY ),
                            matcher.group( YEAR ) );
            setSeqNumber( matcher.group( SEQUENCE ) );
            setTypeCode( matcher.group( TYPE_CODE ) );
        }

I have a new requirement to reverse this process and actually generate mock files from the bank so we can test. 我有一个新要求,要求撤销此过程,并从银行实际生成模拟文件,以便我们进行测试。 Using this method, is there a way i can reverse the process using this same regex method to generate the file or do i just go back to building a standard parser. 使用这种方法,有没有一种方法可以使我使用相同的正则表达式方法来逆转该过程以生成文件,或者我只是回到构建标准解析器。

thanks 谢谢

This basically does what you ask for. 这基本上可以满足您的要求。 You can play with it until it suits your needs. 您可以使用它直到适合您的需求。

import java.util.*;

class Main
{
    public static String getLine(String bankID, String acctID, String companyName, String month, String day, String year, String seq, String typeCode)
    {
        return new Formatter()
               .format("%3.3s%12.12s%20.20s%2.2s%2.2s%2.2s%12.12s%2.2s", 
                       bankID, acctID, companyName, month,
                       day, year, seq, typeCode)
               .toString(); // 1 semicolon, technically a 1 liner.  aww yeah
    }

    public static void main(String[] args)
    {
        String tester = "123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        System.out.println(getLine(tester, tester, tester, tester,
                                   tester, tester, tester, tester));
    }
}

The output of that example is: 该示例的输出为:

123123456789ABC123456789ABCDEFGHIJK121212123456789ABC12 123123456789ABC123456789ABCDEFGHIJK121212123456789ABC12

Here's the ideone. 这是ideone。

If by reversing you mean outputting an object to a file, then parser is not what you need. 如果反向意味着将对象输出到文件,则解析器不是您所需要的。 All you need to do is implement a method that outputs the same data members using a similar format to a file. 您需要做的就是实现一种方法,该方法使用与文件相似的格式输出相同的数据成员。 You can use String.format with the field lengths you have in the regex. 您可以将String.format与正则表达式中的字段长度一起使用。 With some refactoring, you can extract commonalities between the regex and the string format, although you might consider this an overkill, as this regex is fairly simple. 通过一些重构,您可以提取正则表达式和字符串格式之间的共性,尽管您可能认为这是一个过大的选择,因为此正则表达式非常简单。

You need to step away from letting the regex control you. 您需要远离让正则表达式控制您的步骤。 If you define your structure in another way (I use an enum below) from which you can derive your regex and a formatter then not only will the code become much more extensible but you will also be able to make a marshaller and an unmarshaller from it too. 如果您以另一种方式定义结构(在下面我使用一个enum ),您可以从中获取正则表达式和格式器,那么代码不仅将变得可扩展得多,而且还可以从中进行编组和拆组太。

Something like this may be a good start: 这样的事情可能是一个好的开始:

public class BankRecords {
  static enum AccountField {
    BANK_ID("\\d", 3) {
      @Override
      void fill ( Account a, String s ) {
        a.bankId = s;
      }
    },
    ACCOUNT_ID("\\d", 12) {
      @Override
      void fill ( Account a, String s ) {
        a.accountID = s;
      }
    },
    COMPANY_NAME(".", 20) {
      @Override
      void fill ( Account a, String s ) {
        a.companyName = s;
      }
    },
    MONTH("\\d", 2) {
      @Override
      void fill ( Account a, String s ) {
        a.month = s;
      }
    },
    DAY("\\d", 2) {
      @Override
      void fill ( Account a, String s ) {
        a.day = s;
      }
    },
    YEAR("\\d", 2) {
      @Override
      void fill ( Account a, String s ) {
        a.year = s;
      }
    },
    SEQUENCE("\\d", 12) {
      @Override
      void fill ( Account a, String s ) {
        a.seqNumber = s;
      }
    },
    TYPE_CODE("\\d", 2) {
      @Override
      void fill ( Account a, String s ) {
        a.typeCode = s;
      }
    };
    // The type string in the regex.
    final String type;
    // How many characters.
    final int count;

    AccountField(String type, int count) {
      this.type = type;
      this.count = count;
    }

    // Each field can fill its part in the Account.
    abstract void fill ( Account a, String s );

    // My pattern.
    static Pattern pattern = Pattern.compile(asRegex());

    public static Account parse ( String record ) {
      Account account = new Account ();
      // Fire off the matcher with the regex and put each field in the Account object.
      Matcher matcher = pattern.matcher(record);
      for ( AccountField f : AccountField.values() ) {
        f.fill(account, matcher.group(f.ordinal() + 1));
      }
      return account;
    }

    public static String format ( Account account ) {
      StringBuilder s = new StringBuilder ();
      // Roll each field of the account into the string using the correct length from the enum.
      return s.toString();
    }

    private static String regex = null;

    static String asRegex() {
      // Only do this once.
      if (regex == null) {
        // Grow my regex from the field definitions.
        StringBuilder r = new StringBuilder("^");
        for (AccountField f : AccountField.values()) {
          r.append("(").append(f.type);
          // Special case count = 1 or 2.
          switch (f.count) {
            case 1:
              break;
            case 2:
              // Just one more.
              r.append(f.type);
              break;
            default:
              // More than that shoudl use the {} notation.
              r.append("{").append(f.count).append("}");
              break;
          }
          r.append(")");
        }
        // End of record.
        r.append("$");
        regex = r.toString();
      }
      return regex;
    }
  }

  public static class Account {
    String bankId;
    String accountID;
    String companyName;
    String month;
    String day;
    String year;
    String seqNumber;
    String typeCode;
  }
}

Note how each enum encapsulates the essence of each field. 注意每个enum如何封装每个字段的本质。 The type, the number of characters and where it goes in the Account object. 类型,字符数及其在Account对象中的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM