简体   繁体   English

用Java解析文本文件中的数据

[英]Parse data from text file in Java

I'm trying to create a parser in Java that would help me to get some details from a text file. 我试图用Java创建一个解析器,这将有助于我从文本文件中获取一些详细信息。

The data in the file looks like this, but with more entries: 文件中的数据如下所示,但具有更多条目:

. 
http://www.someurl1.com/
PERSONAL ADDRESS: Mozart, W.A.; Some address 1, Austria; email: mymail1@mail.com

. 
http://www.someurl2.com/
PERSONAL ADDRESS: Beethoven, L.V.; Some address 2, Germany; email: mymail2@mail.com

As you can see, the data always respects a pattern, and what I would like to get is just the name and the e-mail for every entry. 如您所见,数据始终遵循一种模式,而我想获得的只是每个条目的名称和电子邮件。 A possible good output would be this: 可能的良好输出是这样的:

Mozart, W.A. ; mymail1@mail.com
Beethoven, L.V. ; mymail2@mail.com

Every entry starts with a . 每个条目都以开头. followed by a space in the first line. 第一行后跟一个空格。 Then in the next line above the dot, there's the URL. 然后在圆点上方的下一行中,有URL。 In the following line, there's more data: name, address and e-mail, all separated by a ; 在下面的行中,还有更多数据:名称,地址和电子邮件,均用;分隔; .

This isn't hard but I'm having some troubles starting. 这并不难,但是我遇到了一些麻烦。 I've created a Main class in which I read the text file to a String . 我创建了一个Main类,在其中将文本文件读取为String But then I really don't know what's the best way to parse something like this in Java, if I should try to use regular expressions or just get looking for the ; 但是我真的不知道在Java中解析这样的东西的最佳方法是什么,如果我应该尝试使用正则表达式或者只是寻找;的话; .

Read in the text file line by line and then do an action based on that line. 逐行读取文本文件,然后基于该行执行操作。

BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
   // process the line.
   if (line.equals(". "))
   {
       // Do something with first line
       line = br.readLine()
       // Do something with second line
       line = br.readLine()
       // Split up the third line by space 
       String split[]= StringUtils.split(line); // split[1] = "Mozart," so you may need to do a little more work there
   }
}
br.close();

Use split strings for name is easy, then use regular expression to catch the email part! 使用拆分字符串作为名称很容易,然后使用正则表达式来捕获电子邮件部分! There are alot of examples, here is one of them 有很多例子,这里是其中之一

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM