简体   繁体   English

在java中读取文本文件

[英]Read text file in java

Hey, I need to read a textfile in java. 嘿,我需要在java中读取一个文本文件。 The problem is that the file has the following format: 问题是该文件具有以下格式:

Id time1 time2 time3 ...
ID2 time1 time2 time3 ...

I need to be able to first read all the IDs, then read all the time1, then all time2 etc. Can anyone give me some hints how can I do this please in java? 我需要能够先读取所有的ID,然后读取所有的时间1,然后是所有的时间2等等。任何人都可以给我一些提示我怎样才能在java中这样做? Efficiency is important here since this needs to be done for thounsands of times <- this is my problem Thanks in advance for your help 效率在这里非常重要,因为这需要为时间而做 - <这是我的问题在此先感谢您的帮助

Transpose the file. 转置文件。 Ids on line 1, time1 on line 2, and so on. 第1行的Ids,第2行的time1,依此类推。 Of course, this is beneficial if this can be done only once and then many reads on that file are expected. 当然,如果只能进行一次,然后预计会对该文件进行多次读取,这将是有益的。

The simplest way would be to read the whole file line by line once , parsing the lines as you go - then you can very easily get "all the IDs" followed by "all the first times" etc. 最简单的方法是逐行读取整个文件行一次 ,当您去解析行-那么你可以很容易得到“所有ID”,其次是“所有的第一次”等。

If the file is too large to do that, you may want to consider writing a tool to change the file structure - open up several files for writing (one per column) then you can read an input line, write the output data to each file, move onto the next line etc. You can do this once and then read each file as and when you need it. 如果文件太大而无法做到这一点,您可能需要考虑编写一个工具来更改文件结构 - 打开几个文件进行写入(每列一个)然后您可以读取输入行,将输出数据写入每个文件,移动到下一行等。您可以执行此操作一次,然后在需要时读取每个文件。

One solution is to parse the file once and create an index of the positions of each ids in the file. 一种解决方案是解析文件一次,并创建文件中每个ID的位置索引。 Then, you can reposition the reading 'cursor' as needed to ids. 然后,您可以根据需要将读数“光标”重新定位到ID。

EDIT 编辑

This solution is practical if the whole file content cannot be loaded into memory. 如果无法将整个文件内容加载到内存中,则此解决方案很实用。 To limit the number of physical readings, a LRU cache keeping the most recently read or used id-times combinations could improve performance. 为了限制物理读数的数量,保留最近读取或使用的id-times组合的LRU缓存可以提高性能。

We can't read files column-by-column. 我们无法逐列读取文件。 Read the whole file into memory ( FileReader of java.nio ) and parse the content ( String#split on each line) in a datastructure like 将整个文件读入内存( java.nio FileReader )并解析数据结构中的内容(每行上的String#split

Map<String, List<String>>

where the maps key is the id (ID, ID2, ..) and the value a simple list that contains all the time values. 其中maps键是id(ID,ID2,..),值是包含所有时间值的简单列表。

如果您使用的是Linux / UNIX平台,则可以使用cut命令进行一些预处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM