简体   繁体   中英

Filtering from csv files using Java stream

I have a csv file with characters from SW and would like to find the heaviest character using java stream. Here's a sample of the file:

name;height;mass;hair_color;skin_color;eye_color;birth_year;gender
Luke Skywalker;172;77;blond;fair;blue;19BBY;male
C-3PO;167;75;n/a;gold;yellow;112BBY;n/a
R2-D2;96;32;n/a;white, blue;red;33BBY;n/a
Darth Vader;202;136;none;white;yellow;41.9BBY;male
Leia Organa;150;49;brown;light;brown;19BBY;female
Owen Lars;178;120;brown, grey;light;blue;52BBY;male
Beru Whitesun lars;165;75;brown;light;blue;47BBY;female
Grievous;216;159;none;brown, white;green, yellow;unknown;male
Finn;unknown;unknown;black;dark;dark;unknown;male
Rey;unknown;unknown;brown;light;hazel;unknown;female
Poe Dameron;unknown;unknown;brown;light;brown;unknown;male

Expected output is String "Grievous".

Initially I thought of creating a Character class, where I could store the data and work with objects instead of String array after splitting the line. However, each value can be unknown or n/a, so not too sure how to work around it. Is there a way to achieve this using stream only?

This is my initial attempt, mapping each line to new Person object with fields name and height , however this approach does not handle unknown input properly.

public static String getHeaviestCharacter(String file) throws IOException {
    return Files.lines(Paths.get(file))
            .map(line -> line.split(";"))
            .map(part -> new Person(part[0], part[2]))
            .max((p1, p2) -> Integer.compare(p1.getWeight(), p2.getWeight()))
            .map(p1.getName());
}

I would not recommend doing this with Streams, but instead with some CSV library, as it is way more safe.


public static void main(String[] args) {
    try {
        BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

        // Skip first line
        reader.readLine();

        Optional<String> optionalHeaviestCharacter = getHeaviestCharactersName(reader.lines());

        System.out.println(optionalHeaviestCharacter);

    } catch (IOException e) {
        e.printStackTrace();
    }
}

public static Optional<String> getHeaviestCharactersName(Stream<String> lineStream) {
    return lineStream
            .map(lineString -> lineString.split(";")) // map every line string to an array with all values
            .filter(values -> values[2].matches("[0-9]+")) // filter out characters with a non-number value as a mass
            .max((values1, values2) -> Integer.compare(Integer.parseInt(values1[2]), Integer.parseInt(values2[2]))) // get element with maximum mass
            .map(heaviestValues -> heaviestValues[0]); // map values array of heaviest character to its name
}

First we read the file, which I have names characters.csv . You will probably need to edit the filepath to point to your file.

BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

Then we read all lines from the file, each line as a String in the Stream<String> , by calling the reader.lines() method

The function getHeaviestCharactersName will then return an Optional<String> . The Optional will be empty, when for example all characters have an unknown/invalid mass or when there are no characters present at all.

If you think that there will always be at least one character with a valid mass present, you get just get the name of the heaviest character with optionalHeaviestCharacter.get() . Else you would have to check if the Optional is empty first:

if (optionalHeaviestCharacter.isEmpty()) {
    System.out.println("Could not find a character with the heaviest mass");
} else {
    System.out.println("Heaviest character is " + optionalHeaviestCharacter.get());
}

You can just get the name by calling

Streams

As others noted, I doubt streams is the best approach to your particular problem. But since you asked, just for fun, I gave it a try. After much web-searching, and much trial-and-error, I seem to have found a solution using streams.

We use NIO.2 classes Path & Files to open the data file.

We define a stream by calling Files.lines .

We omit the header row by calling Stream#skip .

Some of your input rows have non-numeric value "unknown" in our target third field. So we call Stream#filter to ignore those lines. We extract the third field by using String#split while passing the annoying zero-based index number 2 .

To get the highest number in our third column, we need to sort. To sort, we extract the third field in a Comparator created via Comparator.comparingInt . To get the needed int value, we parse the text of the third field using Integer.parseInt .

After sorting, we need to access the last element in the stream, as that should have our character with the greatest weight. This seems clumsy to me, but apparently the way to get the last element of a stream is .reduce( ( first, second ) -> second ).orElse( null ) . I sure wish we had a Stream#last method!

That last element is a String object, a line of text from your input file. So we need to yet again split the string. But this time when we split, we take the first element rather than the third, as our goal is to report the character's name. The first element is identified by the annoying zero-based index number of 0 .

Voilà, we get Grievous as our final result.

Path path = Paths.get( "/Users/basil_dot_work/inputs.csv" );
if ( Files.notExists( path ) ) { throw new IllegalStateException( "Failed to find file at path: " + path ); }

Stream < String > lines;
try { lines = Files.lines( path , StandardCharsets.UTF_8 ); } catch ( IOException e ) { throw new RuntimeException( e ); }
String result =
        lines
                .skip( 1L )  // Skip the header row, with column names.
                .filter(  // Filter out lines whose targeted value is "unknown". We need text made up only of digits.
                        line -> ! line.split( ";" )[ 2 ].equalsIgnoreCase( "unknown" )
                )
                .sorted(  // Sort by extracting third field’s text, then parse to get an `int` value.
                        Comparator.comparingInt( ( String line ) -> Integer.parseInt( line.split( ";" )[ 2 ] ) )
                )
                .reduce( ( first , second ) -> second ).orElse( null ) // Get last element.
                .split( ";" )[ 0 ]; // Extract name of character from first field of our one and only line of input left remaining after processing.

System.out.println( "result = " + result );

result = Grievous

Be sure to compare my approach here with that of the other Answer, by Florian Hartung . The other may well be better; I've not yet studied carefully.

Without streams

For comparison, here is more conventional code, with little or no use of streams.

We read lines from the file in the same manner as seen above.

We need to skip the first row, the header row of column titles. But the List object returned by Files.lines is unmodifiable. So we cannot simply delete the first element of that list. So we effectively skip the first line by calling lines.subList( 1, lines.size() ) . The subList command returns a list that is mapped as a view back onto the original, not actually creating a new and separate list. This is efficient and appropriate for our use here.

We define a class as a record to hold each person's details. We use Integer rather than int so that we can hold a null for the lines that carry unknown text rather than a number.

For each line, we directly transfer the textual items to String member fields. But for height and mass we use a ternary operator to either return null or to instantiate a Integer object.

We collect our Person objects by adding to a list.

To get the maximum the Person object whose mass is the largest, we need to ignore those with a null . So we use a simple stream here to make new list of Person objects with non-null mass. This stream could be replaced with a conventional loop, but would be more verbose.

With our filtered list, we call Collections.max while passing a Comparator object that compares the mass member field.

We end up with a single Person object. So we interrogate for its name member field.

Voilà, we get Grievous as our final result.

Path path = Paths.get( "/Users/basil_dot_work/inputs.csv" );
if ( Files.notExists( path ) ) { throw new IllegalStateException( "Failed to find file at path: " + path ); }

List < String > lines;
try { lines = Files.lines( path , StandardCharsets.UTF_8 ).toList(); } catch ( IOException e ) { throw new RuntimeException( e ); }
lines = lines.subList( 1 , lines.size() ); // Skip over first line.

record Person( String name , Integer height , Integer mass , String hair_color , String skin_color , String eye_color , String birth_year , String gender ) { }
List < Person > persons = new ArrayList <>();
for ( String line : lines )
{
    String[] parts = line.split( ";" );

    Integer height = ( parts[ 1 ].equalsIgnoreCase( "unknown" ) ) ? null : Integer.valueOf( parts[ 1 ] );
    Integer mass = ( parts[ 2 ].equalsIgnoreCase( "unknown" ) ) ? null : Integer.valueOf( parts[ 2 ] );
    Person person = new Person( parts[ 0 ] , height , mass , parts[ 3 ] , parts[ 4 ] , parts[ 5 ] , parts[ 6 ] , parts[ 7 ] );
    persons.add( person );
}
System.out.println( "persons = " + persons );
List < Person > personsWithMass = persons.stream().filter( person -> Objects.nonNull( person.mass ) ).toList();
Person heaviestPerson = Collections.max( personsWithMass , Comparator.comparing( person -> person.mass ) );

System.out.println( "heaviest Person’s name = " + heaviestPerson.name );

heaviest Person's name = Grievous

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM