简体   繁体   English

使用java进行纯文本解析

[英]Plain text parsing using java

I have an output from ssh like below. 我有一个ssh的输出,如下所示。 I want to parse the below output to an hashmap using java. 我想使用java将下面的输出解析为hashmap。 any suggestions can be helpful... 任何建议都有帮助......

Name        : mysql                        Relocations: (not relocatable)
Version     : 5.1.61                            Vendor: CentOS
Release     : 4.el6                         Build Date: Fri 22 Jun 2012 05:58:59 AM PDT
Install Date: Tue 13 Nov 2012 02:23:23 AM PST      Build Host: c6b10.bsys.dev.centos.org
URL         : http://www.mysql.com
Summary     : MySQL client programs and shared libraries

My output should be a hashmap like 我的输出应该是一个类似的哈希映射

Key Value 核心价值

Name mysql 名称mysql

Relocations (not relocatable) 重新安置(不可重新定位)

Version 5.1.61 版本5.1.61

Release 4.el6 发布4.el6

A regular expression should do the trick here: 正则表达式应该在这里诀窍:

public static void main(String[] args) {
    StringBuilder sb = new StringBuilder();
    sb.append("Name        : mysql                        Relocations: (not relocatable)\n");
    sb.append("Version     : 5.1.61                            Vendor: CentOS\n");
    sb.append("Release     : 4.el6                         Build Date: Fri 22 Jun 2012 05:58:59 AM PDT\n");
    sb.append("Install Date: Tue 13 Nov 2012 02:23:23 AM PST      Build Host: c6b10.bsys.dev.centos.org\n");
    sb.append("URL         : http://www.mysql.com\n");
    sb.append("Summary     : MySQL client programs and shared libraries\n");

    Pattern p = Pattern.compile("([^\\r\\n:]+):\\s(.+?)(\\s{2,}|\\r\\n|\\r|\\n|$)");
    Matcher m = p.matcher(sb.toString());
    while(m.find()) {
        String key = m.group(1).trim();
        String value = m.group(2);

        System.out.println(key + " = \"" + value + "\"");
    }
}

which outputs: 哪个输出:

Name = "mysql"
Relocations = "(not relocatable)"
Version = "5.1.61"
Vendor = "CentOS"
Release = "4.el6"
Build Date = "Fri 22 Jun 2012 05:58:59 AM PDT"
Install Date = "Tue 13 Nov 2012 02:23:23 AM PST"
Build Host = "c6b10.bsys.dev.centos.org"
URL = "http://www.mysql.com"
Summary = "MySQL client programs and shared libraries"

Try this regex as a starting point: 试试这个正则表达式作为起点:

([a-zA-Z][a-zA-Z ]*): (.*?)(( {2,})|$)

First group should capture the key, second group the value. 第一组应该捕获键,第二组应该捕获值。 It assumes two things: 它假设有两件事:

1) There are at least two spaces or the end of the line after a value. 1)在值之后至少有两个空格或行的结尾。 2) There is never two spaces one after another within a value. 2)在一个值内一个接一个地没有两个空格。

(It is important that these assumptions are really true. They are true in your example but you would need to verify that it is always true for your input.) (重要的是这些假设确实如此。在您的示例中它们是正确的,但您需要验证输入始终是真的。)

I tested it against your example above and it seems to work, try: http://regexpal.com/ (You need to enable checkbox "^$ match at line breaks" at the top to make it work) 我根据你上面的例子对它进行了测试,似乎有效,请尝试: http//regexpal.com/ (你需要在顶部启用复选框“^ $ match at line break”以使其工作)

If that regex is OK, use Pattern and Matcher from the Java API to build up your hashmap. 如果该正则表达式正常,请使用Java API中的PatternMatcher来构建您的hashmap。 Ah, and you should trim() your matched keys and values to get rid of the extra spaces at end. 啊,你应该trim()匹配的键和值,以消除末尾的额外空格。

try to create some regular expressions. 尝试创建一些正则表达式。 Since many of the items are pretty defined well you can catch the starting and ending points using String.indexOf() . 由于许多项目定义得很好,您可以使用String.indexOf()捕获起点和终点。 Then get the substring() , and develop the hashmap. 然后获取substring() ,并开发hashmap。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM