简体   繁体   English

Java根据列日期对CSV文件进行排序

[英]Java sort a csv file based on column date

Need to sort a csv file based on the date column. 需要根据日期列对csv文件进行排序。 This is how the masterRecords array list looks like 这是masterRecords数组列表的样子

GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:15:00 AM MYT,+0,COMPL
GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:00:00 AM MYT,+0,COMPL
GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014  - 07:30:00 AM MYT,+0,COMPL

I need to sort it out based from the date 07:15:00, 07:30:00, etc. I created a code to sort it out: 我需要根据日期07:15:00、07:30:00等对其进行排序。我创建了一个代码对其进行排序:

// Date is fixed on per 15min interval
ArrayList<String> sortDate = new ArrayList<String>();
    sortDate.add(":00:");
    sortDate.add(":15:");
    sortDate.add(":30:");
    sortDate.add(":45:");

    BufferedWriter bw = new BufferedWriter(new FileWriter(tempPath + filename));

    for (int k = 0; k < sortDate.size(); k++) {
        String date = sortDate.get(k);
        for (int j = 0; j < masterRecords.size(); j++) {
            String[] splitLine = masterRecords.get(j).split(",", -1);
            if (splitLine[10].contains(date)) {
                bw.write(masterRecords.get(j) + System.getProperty("line.separator").replaceAll(String.valueOf((char) 0x0D), ""));
                masterRecords.remove(j);
            }
        }
    }
bw.close();

You can see from above it will loop thru a first array (sortDate) and loop thru again on the second array which is the masterRecord and write it on a new file. 您可以从上方看到它循环通过第一个数组(sortDate),然后再次循环通过第二个数组(即masterRecord),并将其写入新文件中。 It seems to be working as the new file is sorted out but I notice that my masterRecord has 10000 records but after creating a new file the record shrinks to 5000, Im assuming its how I remove the records from the master list. 似乎随着新文件的整理工作,但我注意到我的masterRecord有10000条记录,但是在创建新文件后,记录缩小到5000,我假设它是如何从主列表中删除记录的。 Anyone knows why? 有人知道为什么吗?

Is not safe to remove an item inside of a loop. 不安全地删除循环内的项目。 You have to iterate array over Iterator, for example: 您必须通过Iterator迭代数组,例如:

List<String> names = ....
Iterator<String> i = names.iterator();
while (i.hasNext()) {
   String s = i.next(); // must be called before you can call i.remove()
   // Do something
   i.remove();
}

The documentation says: 该文件说:

The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. 此类的迭代器和listIterator方法返回的迭代器是快速失败的:如果在创建迭代器后的任何时间以任何方式对列表进行结构修改,除非通过迭代器自己的remove或add方法,否则迭代器将抛出ConcurrentModificationException。 Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. 因此,面对并发修改,迭代器会快速干净地失败,而不会在未来的不确定时间内冒任意,不确定的行为的风险。

The accepted answer by Lautaro Cozzani is correct. Lautaro Cozzani 接受的答案是正确的。

And Now for Something Completely Different 而现在,完全不同的事情

For fun here is an entirely different approach. 为了好玩,这里是一种完全不同的方法。

I used two libraries: 我使用了两个库:

Apache Commons CSV Apache Commons CSV

The Commons CSV library handles the parsing of various flavors of CSV. Commons CSV库处理各种样式的CSV的解析。 It can return a List of the rows from the file, each row being represented by their CSVRecord object. 它可以返回文件中的行的列表,每行均由其CSVRecord对象表示。 You can ask that object for the first field, second field, and so on. 您可以向该对象询问第一个字段,第二个字段,依此类推。

Joda-Time 乔达时间

Joda-Time does the work of parsing the date-time strings. Joda-Time完成解析日期时间字符串的工作。

Avoid 3-letter Time Zone Codes 避免使用3个字母的时区代码

Beware: Joda-Time refuses to try to parse the three-letter time zone code MYT . 当心:Joda-Time拒绝尝试解析三个字母的时区代码MYT For good reason: Those 3 or 4 letter codes are mere conventions, neither standardized nor unique. 出于充分的理由:这3或4个字母代码仅仅是约定,既不是标准化的也不是唯一的。 My example code below assumes all your data is using MYT . 我下面的示例代码假定您的所有数据都在使用MYT My code assigns the proper time zone name xxx . 我的代码分配了正确的时区名称xxx I suggest you enlighten whoever creates your input data to learn about proper time zone names and about ISO 8601 string formats. 我建议您启发任何创建输入数据的人,以了解正确的时区名称ISO 8601字符串格式。

Java 8 Java 8

My example code requires Java 8, using the new Lambda syntax and "streams". 我的示例代码需要使用新的Lambda语法和“流”的Java 8。

Example Code 范例程式码

This example does a double-layer sort. 本示例进行双层排序。 First the rows are sorted by the minute-of-hour (00, 15, 30, 45). 首先,按分钟(00、15、30、45)对行进行排序。 Within each of those groups, the rows are sorted by the date-time value (ordered by year, month, day-of-month, and time-of-day). 在每个组中,行均按日期时间值排序(按年,月,月日和时间排序)。

First we open the .csv text file, and parse its contents into CSVRecord objects. 首先,我们打开.csv文本文件,并将其内容解析为CSVRecord对象。

String filePathString = "/Users/brainydeveloper/input.csv";
try {
    Reader in = new FileReader( filePathString ); // Get the input file.
    List<CSVRecord> recs = CSVFormat.DEFAULT.parse( in ).getRecords(); // Parse the input file.

Next we wrap those CSVRecord objects each inside a smarter class that extracts the two values we care about: first the DateTime, secondly the minute-of-hour of that DateTime. 接下来,我们将这些CSVRecord对象包装在一个更智能的类中,该类提取我们关心的两个值:首先是DateTime,其次是该DateTime的小时数。 See further down for the simple code of that class CsvRecWithDateTimeAndMinute . 进一步参见该类CsvRecWithDateTimeAndMinute的简单代码。

    List<CsvRecWithDateTimeAndMinute> smartRecs = new ArrayList<>( recs.size() ); // Collect transformed data.
    for ( CSVRecord rec : recs ) { // For each CSV record…
        CsvRecWithDateTimeAndMinute smartRec = new CsvRecWithDateTimeAndMinute( rec ); // …transform CSV rec into one of our objects with DateTime and minute-of-hour.
        smartRecs.add( smartRec );
    }

Next we take that list of our smarter wrapped objects, and break that list into multiple lists. 接下来,我们将更智能的包装对象列表放到一个列表中,然后将该列表分成多个列表。 Each new list contains the CSV row data for a particular minute-of-hour (00, 15, 30, and 45). 每个新列表都包含特定小时(00、15、30和45)的CSV行数据。 We store these in a map. 我们将它们存储在地图中。

If our input data has only occurrences of those four values, the resulting map will have only four keys. 如果我们的输入数据仅出现这四个值,则结果映射将仅具有四个键。 Indeed, you can do a sanity-check by looking for more than four keys. 实际上,您可以通过查找四个以上的密钥来进行完整性检查。 Extra keys would mean either something went terribly wrong in parsing or there is some data with unexpected minute-of-hour values. 额外的键可能意味着解析时发生了严重错误,或者某些数据具有意外的分钟值。

Each key (the Integer of those numbers) leads to a List of our smart wrapper objects. 每个键(这些数字的整数)都指向我们的智能包装对象的列表。 Here is some of that fancy new Lambda syntax. 这是一些新颖的Lambda语法。

    Map<Integer , List<CsvRecWithDateTimeAndMinute>> byMinuteOfHour = smartRecs.stream().collect( Collectors.groupingBy( CsvRecWithDateTimeAndMinute::getMinuteOfHour ) );

The map does not give us our sub-lists with our keys (minute-of-hour Integers) sorted. 该地图没有为我们的子列表提供键(小时整数)的排序。 We might get back the 15 group before we get the 00 group. 我们可能在获得00组之前重新获得15组。 So extract the keys, and sort them. 因此,提取密钥并对它们进行排序。

    // Access the map by the minuteOfHour value in order. We want ":00:" first, then ":15", then ":30:", and ":45:" last.
    List<Integer> minutes = new ArrayList<Integer>( byMinuteOfHour.keySet() ); // Fetch the keys of the map.
    Collections.sort( minutes ); // Sort that List of keys.

Following along that list of ordered keys, ask the map for each key's list. 跟随该有序键列表,向地图询问每个键的列表。 That list of data needs to be sorted to get our second-level sort (by date-time). 该数据列表需要进行排序以获得第二级排序(按日期时间)。

    List<CSVRecord> outputList = new ArrayList<>( recs.size() ); // Make an empty List in which to put our CSVRecords in double-sorted order.
    for ( Integer minute : minutes ) {
        List<CsvRecWithDateTimeAndMinute> list = byMinuteOfHour.get( minute );
        // Secondary sort. For each group of records with ":00:" (for example), sort them by their full date-time value.
        // Sort the List by defining an anonymous Comparator using new Lambda syntax in Java 8.
        Collections.sort( list , ( CsvRecWithDateTimeAndMinute r1 , CsvRecWithDateTimeAndMinute r2 ) -> {
            return r1.getDateTime().compareTo( r2.getDateTime() );
        } );
        for ( CsvRecWithDateTimeAndMinute smartRec : list ) {
            outputList.add( smartRec.getCSVRecord() );
        }
    }

We are done manipulating the data. 我们已经完成了数据处理。 Now it is time to export back out to a text file in CSV format. 现在是时候将其导出回CSV格式的文本文件了。

    // Now we have complete List of CSVRecord objects in double-sorted order (first by minute-of-hour, then by date-time).
    // Now let's dump those back to a text file in CSV format.
    try ( PrintWriter out = new PrintWriter( new BufferedWriter( new FileWriter( "/Users/brainydeveloper/output.csv" ) ) ) ) {
        final CSVPrinter printer = CSVFormat.DEFAULT.print( out );
        printer.printRecords( outputList );
    }

} catch ( FileNotFoundException ex ) {
    System.out.println( "ERROR - Exception needs to be handled." );
} catch ( IOException ex ) {
    System.out.println( "ERROR - Exception needs to be handled." );
}

The code above loads the entire CSV data set into memory at once. 上面的代码将整个CSV数据集立即加载到内存中。 If wish to conserve memory, use the parse method rather than getRecords method. 如果希望节省内存,请使用parse方法而不是getRecords方法。 At least that is what the doc seems to be saying. 至少这就是医生似乎在说的话。 I've not experimented with that, as my use-cases so far all fit easily into memory. 我还没有尝试过,因为到目前为止我的用例都很容易放入内存。

Here is that smart class to wrap each CSVRecord object: 这是包装每个CSVRecord对象的智能类:

package com.example.jodatimeexperiment;

import org.apache.commons.csv.CSVRecord;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

/**
 *
 * @author Basil Bourque
 */
public class CsvRecWithDateTimeAndMinute
{

    // Statics
    static public final DateTimeFormatter FORMATTER = DateTimeFormat.forPattern( "MMM dd yyyy'  - 'hh:mm:ss aa 'MYT'" ).withZone( DateTimeZone.forID( "Asia/Kuala_Lumpur" ) );

    // Member vars.
    private final CSVRecord rec;
    private final DateTime dateTime;
    private final Integer minuteOfHour;

    public CsvRecWithDateTimeAndMinute( CSVRecord recordArg )
    {
        this.rec = recordArg;
        // Parse record to extract DateTime.
        // Expect value such as: Dec 15 2014  - 07:15:00 AM MYT
        String input = this.rec.get( 7 - 1 );  // Index (zero-based counting). So field # 7 = index # 6.
        this.dateTime = CsvRecWithDateTimeAndMinute.FORMATTER.parseDateTime( input );
        // From DateTime extract minute of hour
        this.minuteOfHour = this.dateTime.getMinuteOfHour();
    }

    public DateTime getDateTime()
    {
        return this.dateTime;
    }

    public Integer getMinuteOfHour()
    {
        return this.minuteOfHour;
    }

    public CSVRecord getCSVRecord()
    {
        return this.rec;
    }

    @Override
    public String toString()
    {
        return "CsvRecWithDateTimeAndMinute{ " + " minuteOfHour=" + minuteOfHour + " | dateTime=" + dateTime + " | rec=" + rec + " }";
    }

}

With this input… 通过此输入…

GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月15日-MYT,+ 0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND, TDTN,15-MIN,2014年12月15日-MYT + 0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月15日-07:30: MYT 00 AM,+ 0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月14日-MYT MYR,+ 0,COMPL GBEP-1-2- 1,FRAG,PMTypeEthernet,NEND,TDTN,15分钟,2014年12月14日-MYT,+ 0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15分钟, 2014年12月14日-MYT上午07:30:00,+ 0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年1月22日-MYT上午+0 ,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,MYT 2014年1月22日-07:00:00,+ 0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet, NEND,TDTN,15-MIN,2014年1月22日-MYT,+ 0,COMPL

…you will get this output… …您将获得此输出…

GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:00:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:15:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Jan 22 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 14 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,Dec 15 2014 - 07:30:00 AM MYT,+0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15分钟,2014年1月22日-MYT,+ 0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND, TDTN,15-MIN,2014年12月14日-MYT + 0,COMPL GBEP-1-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月15日-07:00: MYT 00 AM,+ 0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年1月22日-MYT MYR,+ 0,COMPL GBEP-1-2- 4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月14日-MYT,+ 0,COMPL GBEP-1-2-4,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN, 2014年12月15日-MYT上午0:0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年1月22日-MYT上午+0 ,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet,NEND,TDTN,15-MIN,2014年12月14日-MYT,+ 0,COMPL GBEP-2-2-1,FRAG,PMTypeEthernet, NEND,TDTN,15分钟,2014年12月15日-MYT上午07:30:00,+ 0,COMPL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM