简体   繁体   English

在空格之前解析文本 Java 正则表达式匹配,在空格之后解析数字以生成 CSV

[英]Parse text Java regex match before whitespace, and numbers after whitespace to generate CSV

At the moment I'm using this simple regex:目前我正在使用这个简单的正则表达式:

[^\s]

Which I cobbled together with the help of these docs .我是在这些文档的帮助下拼凑起来的。

It can grab the following information:它可以抓取以下信息:

在此处输入图片说明

However the full dataset looks like this:然而,完整的数据集如下所示:

#### LOGS ####
CONSOLE:
makePush            2196
makePush            638
makePush            470
opAdd           8342
opAdd           288
opStop          133
0x
DEBUG:
#### TRACE ####
PUSH32          pc=00000000 gas=10000000000 cost=3

PUSH32          pc=00000033 gas=9999999997 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000005

PUSH32          pc=00000066 gas=9999999994 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000005
00000001  0000000000000000000000000000000000000000000000000000000000000005

ADD             pc=00000099 gas=9999999991 cost=3
Stack:
00000000  0000000000000000000000000000000000000000000000000000000000000005
00000001  0000000000000000000000000000000000000000000000000000000000000005
00000002  0000000000000000000000000000000000000000000000000000000000000005

ADD             pc=00000100 gas=9999999988 cost=3
Stack:
00000000  000000000000000000000000000000000000000000000000000000000000000a
00000001  0000000000000000000000000000000000000000000000000000000000000005

STOP            pc=00000101 gas=9999999985 cost=0
Stack:
00000000  000000000000000000000000000000000000000000000000000000000000000f

Finally I need my result to look like this:最后我需要我的结果看起来像这样:

makePush, 2196
makePush, 638
makePush, 470
opAdd, 8342
opAdd, 288
opStop, 133

And the regex I've provided is certainly not robust enough to capture that.我提供的regex肯定不够健壮,无法捕捉到这一点。

What I'm trying to do is:我想要做的是:

  • Ignore any string in the input that doesn't have the form makePush 2196忽略输入中没有makePush 2196形式的任何字符串

  • For lines that are of the form depicted above...对于上述形式的线条...

    • Split it into three groups"分成三组”

      first word , whitespace , second word first wordwhitespacesecond word

  • Finally I want to save a csv of the form:最后我想保存一个 csv 格式:

    first word , second word first wordsecond word

Try this?尝试这个?

/([a-zA-Z]+)[\t ]+(\d+)/g

where在哪里

  • ([a-zA-Z]+) matches a single word literals ([a-zA-Z]+)匹配单个单词文字
  • [\\t ]+ matches horizontal white spaces [\\t ]+匹配水平空白
  • (\\d+) matches the number literals (\\d+)匹配数字文字

Try this (idea from Pshemo but use \\w+)试试这个(来自 Pshemo 的想法,但使用 \\w+)

        Pattern pattern = Pattern.compile("^(\\w+)\\s+(\\d+)$");
        Matcher matcher = pattern.matcher(str);
        while (matcher.find())
        {
            System.out.println(matcher.group(1)+", "+matcher.group(2));
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM