[英]Parsing a table using regex - Java
I'm parsing the following AWS
cost instance table: 我正在解析以下
AWS
成本实例表:
m1.small 1 1 1.7 1 x 160 $0.044 per Hour
m1.medium 1 2 3.75 1 x 410 $0.087 per Hour
m1.large 2 4 7.5 2 x 420 $0.175 per Hour
m1.xlarge 4 8 15 4 x 420 $0.35 per Hour
There's a file with those costs: 有一个包含这些费用的文件:
input = new Scanner(file);
String[] values;
while (input.hasNextLine()) {
String line = input.nextLine();
values = line.split("\\s+"); // <-- not what I want...
for (String v : values)
System.out.println(v);
}
However that gives me: 然而,这给了我:
m1.small
1
1
1.7
1
x
160
$0.044
per
Hour
which is not what I want ... A corrected parsed values
(with the right regex) would look like this: 这不是我想要的...更正的解析
values
(使用正确的正则表达式)将如下所示:
['m1.small', '1', '1', '1.7', '1 x 160', '$0.044', 'per Hour']
What would be the right regex
in order to obtain the right result? 为了获得正确的结果,正确的
regex
是什么? One can assume the table will have always the same pattern. 可以假设该表将始终具有相同的模式。
Try this fiddle https://regex101.com/r/sP6zW5/1 试试这个小提琴https://regex101.com/r/sP6zW5/1
([^\\s]+)\\s+(\\d+)\\s+(\\d+)\\s+([\\d\\.]+)\\s+(\\d+ x \\d+)\\s+(\\$\\d+\\.\\d+)\\s+(per \\w+)
match the text and the group is your list. 匹配文本,组是您的列表。
I think use split in your case is too complicated. 我认为在你的情况下使用拆分太复杂了。 If the text is always the same.Just like a reverse procedure of string formatting.
如果文本总是相同的。就像字符串格式的反向过程。
If you want to use a regular expression, you'd do this: 如果你想使用正则表达式,你可以这样做:
String s = "m1.small 1 1 1.7 1 x 160 $0.044 per Hour";
String spaces = "\\s+";
String type = "(.*?)";
String intNumber = "(\\d+)";
String doubleNumber = "([0-9.]+)";
String dollarNumber = "([$0-9.]+)";
String aXb = "(\\d+ x \\d+)";
String rest = "(.*)";
Pattern pattern = Pattern.compile(type + spaces + intNumber + spaces + intNumber + spaces + doubleNumber
+ spaces + aXb + spaces + dollarNumber + spaces + rest);
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
String[] fields = new String[] { matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4),
matcher.group(5), matcher.group(6), matcher.group(7) };
System.out.println(Arrays.toString(fields));
}
Notice how I've broken up the regular expression to be readable. 请注意我是如何将正则表达式分解为可读的。 (As one long String, it is hard to read/maintain.) There's another way of doing it though.
(作为一个长字符串,它很难读/维。)还有另一种方法可以做到这一点。 Since you know which fields are being split, you could just do this simple split and build a new array with the combined values:
由于您知道哪些字段正在拆分,因此您可以执行此简单拆分并使用组合值构建新数组:
String[] allFields = s.split("\\s+");
String[] result = new String[] {
allFields[0],
allFields[1],
allFields[2],
allFields[3],
allFields[4] + " " + allFields[5] + " " + allFields[6],
allFields[7],
allFields[8] + " " + allFields[9] };
System.out.println(Arrays.toString(result));
Split by one oe more spaces. 分开一个更多的空间。 And the spaces must appear in the context below.
空格必须出现在下面的上下文中。
DIGIT - SPACES - NOT "x" 数字 - 空间 - 不是“x”
or 要么
NOT "x" - SPACES - DIGIT 不是“x” - 空间 - 数字
values = line.split("(?<=\\d)\\s+(?=[^x])|(?<=[^x])\\s+(?=\\d)")));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.