简体   繁体   English

Java CSVReader会跳过行以及如何转换csv

[英]Java CSVReader skips rows & how to transform csv

I have been researching all day. 我一整天都在研究。 And it doesn't matter how I code, the result is not what I want it to be. 并且我的编码方式无关紧要,结果不是我想要的。

First things first, I am working with Big Data, therefore, I do not think it is efficient to keep copy and pasting row entries. 首先,我正在使用大数据,因此,我不认为保持复制和粘贴行条目是有效的。 I'm reading a CSV file, and it is working, it is cutting out everything I tell it to cut out. 我正在读一个CSV文件,它正在工作,它正在削减我告诉它切断的所有内容。 Everything is fine so far. 到目前为止一切都很好。 Now, the only thing that is going wrong, is the fact that (my opinion) Eclipse (Java) cuts out headers/columnnames from the csv file. 现在,唯一出错的是,(我的观点)Eclipse(Java)从csv文件中删除了头文件/列名。 How to fix this problem? 如何解决这个问题?

package data;

import java.io.FileReader;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

import com.opencsv.CSVReader;

public class BelgiumParser {

public static void main(String[] args) {
    // TODO Auto-generated method stub

    //List<String> listBelgium;
    String fileName = "src\\data\\Belgium.csv";


    try{
        List<String> listBelgium = Files.readAllLines(Paths.get(fileName));

        //CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', 1);

        for(String line : listBelgium){

            line = line.replace("\"" , "");
            line = line.replaceAll("T", " ");
            line = line.replaceAll("Z", "");                

            System.out.println(line);

    }}catch(Exception e){
        //System.out.println(e.getMessage());       
        e.printStackTrace();

    }
}

} }

Also tried the while loop: 还尝试了while循环:

while(line = bufferedReader.readLine()) != null){...}

Yes I tried both bufferedReader and CSVReader. 是的我尝试了bufferedReader和CSVReader。 I might have even found the Python solution to this? 我甚至可能找到了这个Python解决方案?

headers = next(reader, None)  # returns the headers or `None` if the input is empty

if headers:
    writer.writerow(headers)

Not my code, don't know how to link things. 不是我的代码,不知道如何链接的东西。 Main questions: 主要问题:

  • How can I not only make sure that the header are printed (efficient way, I do not want copy/pasted piece of code)? 我怎样才能确保标题打印(有效的方式,我不想复制/粘贴代码片段)?
  • But also, How can I make the Reader also Write the some of the headers vertically (Transforming)? 而且,我怎样才能使Reader也垂直写入一些标题(转换)?

Update: 更新: 在此输入图像描述

Containing hundreds of rows of data: -No measurement equals null -Measurement equals integer or doubles(?) 包含数百行数据: - 测量值等于null -Measurement等于整数或双精度数(?) 在此输入图像描述

What should happen is: - In the time, the T and Z have to go. 应该发生的事情是: - 当时,T和Z必须去。 - T should be a space: " ", and Z just "" - Column B and higher, row 1, should only contain the plantname itself. - T应该是一个空格:“”和Z只是“” - 列B和更高的第1行,应该只包含植物名称本身。

Eventually, should be able to put this all in a MySQL DB, in a clear format, such that it can be implemented with a D3.js line chart, in a Java Server Faces (class?) 最终,应该能够以一种清晰的格式将这一切都放在MySQL数据库中,以便可以在Java Server Faces(类?)中使用D3.js折线图实现它。

If you are dealing with Big Data then I recommend you to get univocity-parsers as it is much faster than anything else. 如果您正在处理大数据,那么我建议您使用univocity-parsers,因为它比其他任何东西都快。 Then try not to load all rows in memory because it's an obvious problem, and stream them instead. 然后尝试不加载内存中的所有行,因为这是一个明显的问题,而是流式传输它们。 Here's a simple example to get you started: 这是一个让您入门的简单示例:

CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically(); //you can configure the format manually if you prefer.
 parserSettings.setHeaderExtractionEnabled(true); //you want to get the headers from the input
settings.selectFields("a", "b", "c"); //select just the columns you need.

CsvParser parser = new CsvParser(settings);

File input = Paths.get(fileName).toFile();
parser.beginParsing(input, "UTF-8");

String[] row;
while ((row = parser.parseNext()) != null) {
    //do your stuff here.

    //here are your headers
    String[] headers = parser.getContext().parsedHeaders();
}

Your second question, if I understood it correctly, is that you want to transpose the rows, ie have all data of a column associated with a header. 如果我理解正确的话,你的第二个问题是你要转置行,即将列的所有数据都与标题相关联。

For that, use a ColumnProcessor (this loads all data in memory, I'll show you the alternative later): 为此,使用ColumnProcessor (这将加载内存中的所有数据,稍后我将向您展示替代方案):

ColumnProcessor columnProcessor = new ColumnProcessor();
parserSettings.setProcessor(columnProcessor);

CsvParser parser = new CsvParser(parserSettings);
parser.parse(input, "UTF-8"); //all rows are submitted to the processor created above.

//At the end of the process, you can get your data like this:
Map<String, List<String>> columnValues = new TreeMap<String, List<String>>(columnProcessor.getColumnValuesAsMapOfNames());

If you have too much data, you'll need to perform the transpose operation in batches. 如果您有太多数据,则需要批量执行转置操作。 Use the BatchedColumnProcessor for that: 使用BatchedColumnProcessor

BatchedColumnProcessor columnProcessor = new BatchedColumnProcessor(20000 /*runs batches of 20000 rows each*/) {
    @Override
    public void batchProcessed(int rowsInThisBatch) {
        Map<Integer, List<String>> columnsByIndex = getColumnValuesAsMapOfIndexes();

       //process your batch here
    }
};

This should work perfectly. 这应该完美。 Hope it helps. 希望能帮助到你。

Disclaimer: I'm the author of this library, it's open-source and free (Apache V2.0 license) 免责声明:我是这个库的作者,它是开源和免费的(Apache V2.0许可证)

CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', 1);

Last parameter in above code, you are asking the CSVReader to skip line1 while reading the file. 上面代码中的最后一个参数,您要求CSVReader在读取文件时跳过line1。 Instead make use of the default(zero) so that it reads the Headers also. 而是使用默认值(零),以便它也读取标题。

CSVReader reader = new CSVReader(new FileReader("src\\data\\Belgium.csv"), ',', '"', CSVReader.DEFAULT_SKIP_LINES);

Regarding the second question, you would have to write a custom logic by reading the lines into either Arrays or Lists which maintains the order, and handle writing with incremental index. 关于第二个问题,您必须通过将行读入维护顺序的数组或列表来编写自定义逻辑,并使用增量索引处理写入。

The best way probably to do this is to essentially have it read each value of a column, and then store it into an array. 可能这样做的最好方法是基本上读取列的每个值,然后将其存储到数组中。 Then write it into a new transformed CSV file that will print the entire array in one row in any order you want. 然后将其写入一个新的转换后的CSV文件,该文件将以您想要的任何顺序将整个数组打印在一行中。

I can't really give you some psuedocode, because I am not completely familiar with any CSV reader Libraries, but it usually is easy to find one and use the Javadoc to implement it 我真的不能给你一些伪代码,因为我并不完全熟悉任何CSV阅读器库,但通常很容易找到一个并使用Javadoc来实现它

Finally achieved what I was trying to do: 终于实现了我想要做的事情:

package code;

import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class BelgiumParser {

    public static void main(String[] args) throws IOException {

        String fileName = "src/data/Belgium.csv";

        try (CSVReader reader = new CSVReader(new FileReader(fileName), ',', '"', 1)) {
            String[] nextLine;

            while ((nextLine = reader.readNext()) != null) {

                for (String line : nextLine) {

                    line = line.replaceAll("T", " ");
                    line = line.replaceAll("Z", "");
                    line = line.replaceAll("ActualGenerationPerUnit.mean", "");
                    line = line.replaceAll("Plantname:", "");
                    //Escaping curly braces is a must!
                    line = line.replaceAll("\\{", "");
                    line = line.replaceAll("\\}", "");
                    System.out.println(line);

                }


            }
        }
    }}

Still not efficient enough, but does the job.. 仍然没有足够的效率,但做的工作..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用java中的CsvReader从csv文件中读取特定行 - how to read a specific row from csv file using CsvReader in java CSVReader的readNext()函数不会在csv的所有行中循环[编辑:如何处理错误的CSV(删除未转义的引号)] - readNext() function of CSVReader not looping through all rows of csv [EDIT: How to handle erroneous CSV (remove unescaped quotes)] 使用csvreader在Java中无法正确读取CSV - CSV is not getting read properly in java using csvreader 在JAVA 6中使用CSVReader(OpenCSV)读取CSV时避免ArrayIndexOutOfBoundsException - Avoiding ArrayIndexOutOfBoundsException while reading a CSV using CSVReader(OpenCSV) in JAVA 6 如何在CsvReader中使用getHeaders()方法读取csv文件的标头值 - How to read header values of a csv file using getHeaders() method in CsvReader 如何使用Java将.txt文件转换为.csv? - How to transform an .txt file to .csv with Java? 用Java导入CSVReader - Import CSVReader in java java - 如何在Java中将CSVReader对象作为按值调用而不是按引用调用传递? - How to pass CSVReader object as call by value instead of call by reference in java? 如何使用Java 6解决CSVReader的尝试资源错误 - How to solve try-with-resources error for CSVReader using Java 6 如何在 Java/Intellij 中安装 opencsv 并使用 CSVReader? - How do I install opencsv and use CSVReader in Java/Intellij?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM