简体   繁体   English

Java仅从大文件逐行读取\\ n

[英]Java Read in Line by Line from Big File \n ONLY

I have a file that records terminated by "\\n" and columns terminated by X"01", the first non printing character. 我有一个记录以“ \\ n”结尾的列和以X“ 01”(第一个非打印字符)结尾的列的文件。 And it is big... 7GB which will totally screw my laptop memory. 而且很大... 7GB可以完全破坏我的笔记本电脑的内存。

I have done some google around how to read big file line by line by using BufferReader .. etc. But the definition of LINE is a bit different, the readline function will return the line that either ends with "\\n", "^M" ..etc. 我已经完成了一些有关如何使用BufferReader等逐行读取大文件的BufferReader 。但是LINE的定义有些不同,readline函数将返回以“ \\ n”,“ ^ M结尾的行“ ..etc。

I am wondering is there a solution in Java 6/7 to read big files line by line, whose definition is the line end with \\n ONLY. 我想知道Java 6/7中是否有一种解决方案,可以逐行读取大文件,其定义是仅以\\ n结尾的行。

Thanks! 谢谢!

I have a sample data set here and wondering if some one who could run against the sample data and extract the first column timestamp of every line. 我在这里有一个样本数据集 ,想知道是否有人可以对照该样本数据运行并提取每一行的第一列时间戳。

here is what I have done but it only reads in the first line, 这是我所做的,但仅在第一行显示,

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class ParseAdafruit {

    public static void main(String[] args) throws IOException {
        // Predefine the delimiter ^A
        String delimiter = String.valueOf((char) 1);

        Scanner scanner = new Scanner(new File("/Users/.../data")).useDelimiter("\\n");
        while (scanner.hasNext()) {
            String line = scanner.next(); // This is your line
            String[] parts = line.split(delimiter);
            System.out.println(parts[0]);
        }
    }
}

Output 产量

2014-01-28 18:00:41.960205

btw, I had such a good time in Python by using something like this: 顺便说一句,我在Python中度过了一段美好的时光,方法是使用以下代码:

for line in sys.stdin: 
    print line.split(chr(1))[0]

This is how to set a Scanner to separate the string in a file by "\\n". 这是设置Scanner ,以“ \\ n”分隔文件中的字符串。 I don't know what you do with each line, but if you want to read the file into a string use a StringBuilder (or StringBuffer for synchronization) because String is immutable. 我不知道如何处理每一行,但是如果您想将文件读入字符串中,请使用StringBuilder (或用于同步的StringBuffer ),因为String是不可变的。

Scanner scanner = new Scanner(new File("PathToFile")).useDelimiter("\\n");
while (scanner.hasNext()) {
    scanner.next(); // This is your line
}

it seems that the file encoding matters, so we read in the file as UTF-8 before running the scanner 似乎文件编码很重要,因此我们在运行扫描仪之前将文件读为UTF-8

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
import java.util.Scanner;

...

String fileDir = "pathtodata";
try
{
    BufferedReader in = new BufferedReader(new InputStreamReader(
            new FileInputStream(fileDir), "UTF8"));

    Scanner scanner = new Scanner(in).useDelimiter("\\n");
    while (scanner.hasNext())
    {
        String line = scanner.next(); // This is your line
        String[] parts = line.split(delimiter);
        System.out.println(parts[0]);
    }
    scanner.close();
    in.close();
}
catch (Exception e)
{
    e.printStackTrace();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM