简体   繁体   English

读取 CSV 文件并将数据存入 Java

[英]Reading CSV file and storing data in Java

I have a task which requires me to read a CSV file in Java.我有一个任务需要我阅读 Java 中的 CSV 文件。 I have done reading it but I think I do not store them in the way I wanted which enable me to access them on the later tasks such as analyzing some of the data, buidling a graph etc. The CSV file contains several variables in the header and some of the variables are in numbers and some are in alphabets, which mean I would need to store them in Integer or String format.我已经阅读了它,但我认为我没有以我想要的方式存储它们,这使我能够在以后的任务中访问它们,例如分析一些数据、构建图形等。CSV 文件包含 header 中的几个变量有些变量是数字,有些是字母,这意味着我需要将它们存储为 Integer 或字符串格式。

Do note that I did not use any library such as openCSV to read the file as I am a beginner and trying to get familiar with the basic Java.请注意,我没有使用任何库(例如 openCSV)来读取文件,因为我是初学者并试图熟悉基本的 Java。

Below is the nycflight13 which I read and store the data.下面是我读取和存储数据的nycflight13 The instruction given is to not take in any rows that contain the word "NA".给出的指令是不要包含任何包含单词“NA”的行。

`    public class nycflights13 {

        public static void main(String[] args) {
            // TODO Auto-generated method stub
            List<Flights> NYC13 = readFileFromCSV("flights.csv");

            for(Flights a: NYC13) {
                System.out.println(a);
            }
        }

        public static List<Flights> readFileFromCSV (String fileName){
            List<Flights> flightData = new ArrayList <> (); 
            Path pathToFile = Paths.get(fileName);

            try(BufferedReader br = Files.newBufferedReader(pathToFile,
                    StandardCharsets.US_ASCII)){
                br.readLine();
                String line = br.readLine();

                while (line != null) {
                    String [] variable = line.split(",");

                    //convert string array to list
                    List<String> list = Arrays.asList(variable);
                    if(list.contains("NA")) { //Do not take in rows containing "NA"
                        break;
                    } else {
                        Flights dataset = createFlights(variable);
                        flightData.add(dataset);
                    }
                    line = br.readLine();
                }
            }catch (IOException ioe) {
                ioe.printStackTrace();
            }

            return flightData;
        }



        private static Flights createFlights (String [] metadata) {
            int year = Integer.parseInt(metadata[1]); //convert string into int
            int month = Integer.parseInt(metadata[2]); //convert string into int
            int day = Integer.parseInt(metadata[3]); //convert string into int
            int dep_time = Integer.parseInt(metadata[4]); //convert string into int
            String carrier = metadata[10];
            String flight = metadata[11];
            String origin = metadata[13];
            String dest = metadata[14]; 

            return new Flights(year, month, day, dep_time,carrier, flight, origin, dest);
        }

    }`

Below is my class Flights (I have way more variables than what I showed here):下面是我的class Flights (我的变量比我在这里显示的要多):

class Flights {
        private int year; 
        private int month; 
        private int day; 
        private int dep_time;
        private String carrier; 
        private String flight; 
        private String origin;
        private String dest; 

        public Flights(int year, int month, int day, int dep_time, String carrier, String flight, String String origin, String dest) {
            this.year = year; 
            this.month = month; 
            this.day = day; 
            this.dep_time = dep_time;
            this.carrier = carrier; 
            this.flight = flight; 
            this.origin = origin; 
            this.dest = dest; 
        }

        public int getYear() {return year;}
        public void setYear(int year) {this.year = year;}

        public int getMonth() {return month;}
        public void setMonth(int month) {this.month = month; }

        public int getDay() {return day;}
        public void setDay(int day) {this.day = day; }

        public int getdep_time() {return dep_time;}
        public void setdep_time(int dep_time) {this.dep_time = dep_time; }

        ............
        .............
        ...........


        @Override
        public String toString() {
           return "Flights [year=" + year +", month=" + month +", day=" + day +", dep_time=" + 
               dep_time +
                ", carrier=" + carrier + ", flight=" + flight +", origin=" + origin +", dest=" + dest 
              +", air_time=" + air_time +", distance=" + distance +", 
                 hour=" + hour +", minute=" + minute +
                 ", time_hour=" + time_hour +"]";
`

The above code will give me result as below:上面的代码会给我如下结果:

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=929, dep_delay=-3, arr_time=1404, sched_arr_time=1421, arr_delay=-17, carrier="B6", flight=215, tailnum="N775JB", origin="EWR", dest="SJU", air_time=191, distance=1608, hour=9, minute=29, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=922, dep_delay=4, arr_time=1221, sched_arr_time=1219, arr_delay=2, carrier="B6", flight=57, tailnum="N534JB", origin="JFK", dest="PBI", air_time=151, distance=1028, hour=9, minute=22, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=926, sched_dep_time=928, dep_delay=-2, arr_time=1233, sched_arr_time=1220, arr_delay=13, carrier="UA", flight=1597, tailnum="N27733", origin="EWR", dest="EGE", air_time=287, distance=1726, hour=9, minute=28, time_hour=2013-01-01 09:00:00]

Flights [year=2013, month=1, day=1, dep_time=927, sched_dep_time=930, dep_delay=-3, arr_time=1231, sched_arr_time=1257, arr_delay=-26, carrier="DL", flight=1335, tailnum="N951DL", origin="LGA", dest="RSW", air_time=166, distance=1080, hour=9, minute=30, time_hour=2013-01-01 09:00:00]

I have a few questions:我有几个问题:

  1. My csv data actually contains more than 300k rows of data but with the code that I built as above, I only manage to print like 280 lines.我的 csv 数据实际上包含超过 300k 行数据,但是使用上面构建的代码,我只能打印 280 行。 Is it the code went wrong?是不是代码出错了? or we have an upper limit in eclipse in printing lines.或者我们在印刷行中的 eclipse 有上限。

  2. I would like to know how can I access to a particular variables from the List<Flights> such as carrier or month to calculate the total size of carrier or to count the frequency of the month.我想知道如何从List<Flights>访问特定变量,例如运营商或月份,以计算运营商的总规模或计算月份的频率。

  3. What is the correct ways to store data with multiple variables?存储具有多个变量的数据的正确方法是什么? and able to access them in another class.并能够在另一个 class 中访问它们。 OR ways to improve my current code.或改进我当前代码的方法。

Appreciate for the feedback and the times.感谢反馈和时间。 Thanks a million.太感谢了。

Answering your queries:回答您的疑问:

  1. If you did not get any error or exception while executing the code, you need not worry.如果您在执行代码时没有收到任何错误或异常,您不必担心。 Eclipse has default console buffer size which is limited. Eclipse 的默认控制台缓冲区大小是有限的。 Refer - https://javarevisited.blogspot.com/2013/03/how-to-increase-console-buffer-size-in.html参考 - https://javarevisited.blogspot.com/2013/03/how-to-increase-console-buffer-size-in.html

  2. Now that you have read the data, you should go ahead and save it in a Database.现在您已经读取了数据,您应该提前 go 并将其保存在数据库中。 Once you have the data in database you can run all sorts of query you want to get datas that satisfy your conditions.一旦您在数据库中获得数据,您就可以运行各种查询来获取满足您条件的数据。

  3. I did not understand what you mean by 'ways to store data with multiple variables'.我不明白您所说的“使用多个变量存储数据的方法”是什么意思。 Could you clarify?你能澄清一下吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM