简体   繁体   English

Java解析文本文件

[英]Java parsing text file

I need to write a parser for textfiles (at least 20 kb), and I need to determine if words out of a set of words appear in this textfile (about 400 words and numbers). 我需要为文本文件(至少20 kb)编写一个解析器,并且需要确定该文本文件中是否出现了一组单词中的单词(大约400个单词和数字)。 So I am looking for the most efficient possibilitie to do this (if a match is found, i need to do some further processing of this and it's previous line). 所以我正在寻找最有效的方法(如果找到匹配项,则需要对此做一些进一步的处理,这是上一行)。

What I currently do, is to exclude lines that do not contain any information for sure (kind of metadata lines) and then compare word by word - but i don't think that only comparing word by word is the most efficient possibility. 我当前要做的是排除不包含任何信息的行(某些元数据行),然后逐字比较-但我不认为仅逐字比较是最有效的可能性。

Can anyone please provide some tips/hints/ideas/... 任何人都可以提供一些提示/提示/想法/ ...

Thank you very much 非常感谢你

It depends on what you mean with "efficient". 这取决于您对“有效”的含义。

If you want a very straightforward way to code it, keep in mind that the String object in java has method String.contains(CharSequence sequence). 如果您想要一种非常直接的编码方式,请记住java中的String对象具有String.contains(CharSequence sequence)方法。

Then, you could put the file content into a String and then iterate on your keywords you want to check to see if any of those appear in String, using the method contains(). 然后,您可以将文件内容放入String中,然后使用contains()方法迭代要检查的关键字,以查看其中是否有任何关键字出现。

How about the following: 怎么样:

Put all your keywords in a HashSet (Set<String> keywords;)
Read the file one line at once
  For each line in file:
  Tokenize to words
  For each word in line:
  If word is contained in keywords (keywords.containes(word))
    Process actual line
    If previous line is available
        Process previous line
  Keep track of previous line (prevLine = line;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM