Java解析文本文件

Question

I need to write a parser for textfiles (at least 20 kb), and I need to determine if words out of a set of words appear in this textfile (about 400 words and numbers). 我需要为文本文件（至少20 kb）编写一个解析器，并且需要确定该文本文件中是否出现了一组单词中的单词（大约400个单词和数字）。 So I am looking for the most efficient possibilitie to do this (if a match is found, i need to do some further processing of this and it's previous line). 所以我正在寻找最有效的方法（如果找到匹配项，则需要对此做一些进一步的处理，这是上一行）。

What I currently do, is to exclude lines that do not contain any information for sure (kind of metadata lines) and then compare word by word - but i don't think that only comparing word by word is the most efficient possibility. 我当前要做的是排除不包含任何信息的行（某些元数据行），然后逐字比较-但我不认为仅逐字比较是最有效的可能性。

Can anyone please provide some tips/hints/ideas/... 任何人都可以提供一些提示/提示/想法/ ...

Thank you very much 非常感谢你

Answer 1

It depends on what you mean with "efficient". 这取决于您对“有效”的含义。

If you want a very straightforward way to code it, keep in mind that the String object in java has method String.contains(CharSequence sequence). 如果您想要一种非常直接的编码方式，请记住java中的String对象具有String.contains（CharSequence sequence）方法。

Then, you could put the file content into a String and then iterate on your keywords you want to check to see if any of those appear in String, using the method contains(). 然后，您可以将文件内容放入String中，然后使用contains（）方法迭代要检查的关键字，以查看其中是否有任何关键字出现。

Answer 2

How about the following: 怎么样：

Put all your keywords in a HashSet (Set<String> keywords;)
Read the file one line at once
  For each line in file:
  Tokenize to words
  For each word in line:
  If word is contained in keywords (keywords.containes(word))
    Process actual line
    If previous line is available
        Process previous line
  Keep track of previous line (prevLine = line;)

Java解析文本文件

问题描述

2 个解决方案

解决方案1
1 2012-08-01 10:05:22

解决方案2
0 已采纳 2012-08-01 10:51:52

Java解析文本文件

问题描述

2 个解决方案

解决方案1 1 2012-08-01 10:05:22

解决方案2 0 已采纳 2012-08-01 10:51:52

解决方案1
1 2012-08-01 10:05:22

解决方案2
0 已采纳 2012-08-01 10:51:52