简体   繁体   English

使用 Java stream 从 csv 个文件中过滤

[英]Filtering from csv files using Java stream

I have a csv file with characters from SW and would like to find the heaviest character using java stream. Here's a sample of the file:我有一个 csv 文件,其中包含来自 SW 的字符,我想使用 java stream 找到最重的字符。这是该文件的示例:

name;height;mass;hair_color;skin_color;eye_color;birth_year;gender
Luke Skywalker;172;77;blond;fair;blue;19BBY;male
C-3PO;167;75;n/a;gold;yellow;112BBY;n/a
R2-D2;96;32;n/a;white, blue;red;33BBY;n/a
Darth Vader;202;136;none;white;yellow;41.9BBY;male
Leia Organa;150;49;brown;light;brown;19BBY;female
Owen Lars;178;120;brown, grey;light;blue;52BBY;male
Beru Whitesun lars;165;75;brown;light;blue;47BBY;female
Grievous;216;159;none;brown, white;green, yellow;unknown;male
Finn;unknown;unknown;black;dark;dark;unknown;male
Rey;unknown;unknown;brown;light;hazel;unknown;female
Poe Dameron;unknown;unknown;brown;light;brown;unknown;male

Expected output is String "Grievous".预期 output 是字符串“Grievous”。

Initially I thought of creating a Character class, where I could store the data and work with objects instead of String array after splitting the line.最初我想创建一个字符 class,我可以在其中存储数据并在拆分行后使用对象而不是字符串数组。 However, each value can be unknown or n/a, so not too sure how to work around it.但是,每个值都可能是未知的或不适用的,因此不太确定如何解决它。 Is there a way to achieve this using stream only?有没有办法只使用 stream 来实现这个目标?

This is my initial attempt, mapping each line to new Person object with fields name and height , however this approach does not handle unknown input properly.这是我的初步尝试,将每一行映射到新的Person object 字段nameheight ,但是这种方法不能正确处理未知输入。

public static String getHeaviestCharacter(String file) throws IOException {
    return Files.lines(Paths.get(file))
            .map(line -> line.split(";"))
            .map(part -> new Person(part[0], part[2]))
            .max((p1, p2) -> Integer.compare(p1.getWeight(), p2.getWeight()))
            .map(p1.getName());
}

I would not recommend doing this with Streams, but instead with some CSV library, as it is way more safe.我不建议使用 Streams 执行此操作,而是使用一些 CSV 库,因为它更安全。


public static void main(String[] args) {
    try {
        BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

        // Skip first line
        reader.readLine();

        Optional<String> optionalHeaviestCharacter = getHeaviestCharactersName(reader.lines());

        System.out.println(optionalHeaviestCharacter);

    } catch (IOException e) {
        e.printStackTrace();
    }
}

public static Optional<String> getHeaviestCharactersName(Stream<String> lineStream) {
    return lineStream
            .map(lineString -> lineString.split(";")) // map every line string to an array with all values
            .filter(values -> values[2].matches("[0-9]+")) // filter out characters with a non-number value as a mass
            .max((values1, values2) -> Integer.compare(Integer.parseInt(values1[2]), Integer.parseInt(values2[2]))) // get element with maximum mass
            .map(heaviestValues -> heaviestValues[0]); // map values array of heaviest character to its name
}

First we read the file, which I have names characters.csv .首先我们读取文件,我将其命名为characters.csv You will probably need to edit the filepath to point to your file.您可能需要编辑文件路径以指向您的文件。

BufferedReader reader = new BufferedReader(new FileReader(new File("characters.csv")));

Then we read all lines from the file, each line as a String in the Stream<String> , by calling the reader.lines() method然后我们通过调用reader.lines()方法从文件中读取所有行,每一行作为Stream<String>中的一个字符串

The function getHeaviestCharactersName will then return an Optional<String> .然后 function getHeaviestCharactersName将返回一个Optional<String> The Optional will be empty, when for example all characters have an unknown/invalid mass or when there are no characters present at all. Optional 将为空,例如当所有字符都具有未知/无效质量或根本没有字符时。

If you think that there will always be at least one character with a valid mass present, you get just get the name of the heaviest character with optionalHeaviestCharacter.get() .如果您认为总会存在至少一个具有有效质量的字符,那么您只需使用optionalHeaviestCharacter.get()获取最重字符的名称。 Else you would have to check if the Optional is empty first:否则你必须先检查 Optional 是否为空:

if (optionalHeaviestCharacter.isEmpty()) {
    System.out.println("Could not find a character with the heaviest mass");
} else {
    System.out.println("Heaviest character is " + optionalHeaviestCharacter.get());
}

You can just get the name by calling你可以通过调用来获取名称

Streams溪流

As others noted, I doubt streams is the best approach to your particular problem.正如其他人指出的那样,我怀疑流是解决您的特定问题的最佳方法 But since you asked, just for fun, I gave it a try.但既然你问了,只是为了好玩,我试了一下。 After much web-searching, and much trial-and-error, I seem to have found a solution using streams.经过大量网络搜索和反复试验,我似乎找到了使用流的解决方案。

We use NIO.2 classes Path & Files to open the data file.我们使用NIO.2Path & Files打开数据文件。

We define a stream by calling Files.lines .我们通过调用Files.lines

We omit the header row by calling Stream#skip .我们通过调用Stream#skip省略了 header 行。

Some of your input rows have non-numeric value "unknown" in our target third field.您的某些输入行在我们的目标第三字段中具有非数字值“未知”。 So we call Stream#filter to ignore those lines.所以我们调用Stream#filter来忽略这些行。 We extract the third field by using String#split while passing the annoying zero-based index number 2 .我们通过使用String#split提取第三个字段,同时传递令人讨厌的从零开始的索引号2

To get the highest number in our third column, we need to sort.为了获得第三列中的最高数字,我们需要进行排序。 To sort, we extract the third field in a Comparator created via Comparator.comparingInt .为了排序,我们提取通过Comparator创建的Comparator.comparingInt中的第三个字段。 To get the needed int value, we parse the text of the third field using Integer.parseInt .为了获得所需的int值,我们使用Integer.parseInt解析第三个字段的文本。

After sorting, we need to access the last element in the stream, as that should have our character with the greatest weight.排序后,我们需要访问 stream 中的最后一个元素,因为它应该具有最大权重的字符。 This seems clumsy to me, but apparently the way to get the last element of a stream is .reduce( ( first, second ) -> second ).orElse( null ) .这对我来说似乎很笨拙,但显然获取 stream 最后一个元素的方法是 .reduce .reduce( ( first, second ) -> second ).orElse( null ) I sure wish we had a Stream#last method!我真希望我们有一个Stream#last方法!

That last element is a String object, a line of text from your input file.最后一个元素是String object,是输入文件中的一行文本。 So we need to yet again split the string.所以我们需要再次拆分字符串。 But this time when we split, we take the first element rather than the third, as our goal is to report the character's name.但是这次我们拆分时,我们取第一个元素而不是第三个元素,因为我们的目标是报告角色的名字。 The first element is identified by the annoying zero-based index number of 0 .第一个元素由烦人的从零开始的索引号0标识。

Voilà, we get Grievous as our final result. Voilà,我们得到Grievous作为我们的最终结果。

Path path = Paths.get( "/Users/basil_dot_work/inputs.csv" );
if ( Files.notExists( path ) ) { throw new IllegalStateException( "Failed to find file at path: " + path ); }

Stream < String > lines;
try { lines = Files.lines( path , StandardCharsets.UTF_8 ); } catch ( IOException e ) { throw new RuntimeException( e ); }
String result =
        lines
                .skip( 1L )  // Skip the header row, with column names.
                .filter(  // Filter out lines whose targeted value is "unknown". We need text made up only of digits.
                        line -> ! line.split( ";" )[ 2 ].equalsIgnoreCase( "unknown" )
                )
                .sorted(  // Sort by extracting third field’s text, then parse to get an `int` value.
                        Comparator.comparingInt( ( String line ) -> Integer.parseInt( line.split( ";" )[ 2 ] ) )
                )
                .reduce( ( first , second ) -> second ).orElse( null ) // Get last element.
                .split( ";" )[ 0 ]; // Extract name of character from first field of our one and only line of input left remaining after processing.

System.out.println( "result = " + result );

result = Grievous结果=严重

Be sure to compare my approach here with that of the other Answer, by Florian Hartung .请务必将我在这里的方法与Florian Hartung 的另一个答案的方法进行比较。 The other may well be better;另一个可能更好; I've not yet studied carefully.我还没有仔细研究过。

Without streams没有流

For comparison, here is more conventional code, with little or no use of streams.为了进行比较,这里是更传统的代码,很少或根本没有使用流。

We read lines from the file in the same manner as seen above.我们以与上述相同的方式从文件中读取行。

We need to skip the first row, the header row of column titles.我们需要跳过第一行,即列标题的 header 行。 But the List object returned by Files.lines is unmodifiable.但是 Files.lines 返回的List Files.lines是不可修改的。 So we cannot simply delete the first element of that list.所以我们不能简单地删除该列表的第一个元素。 So we effectively skip the first line by calling lines.subList( 1, lines.size() ) .所以我们通过调用lines.subList( 1, lines.size() )有效地跳过了第一行。 The subList command returns a list that is mapped as a view back onto the original, not actually creating a new and separate list. subList命令返回一个列表,该列表作为视图映射回原始列表,而不是实际创建一个新的单独列表。 This is efficient and appropriate for our use here.这是有效的,适合我们在这里使用。

We define a class as a record to hold each person's details.我们定义一个 class 作为记录来保存每个人的详细信息。 We use Integer rather than int so that we can hold a null for the lines that carry unknown text rather than a number.我们使用Integer而不是int以便我们可以为携带unknown文本而不是数字的行保留null

For each line, we directly transfer the textual items to String member fields.对于每一行,我们直接将文本项传输到String成员字段。 But for height and mass we use a ternary operator to either return null or to instantiate a Integer object.但是对于高度和质量,我们使用三元运算符返回null或实例化Integer object。

We collect our Person objects by adding to a list.我们通过添加到列表来收集我们的Person对象。

To get the maximum the Person object whose mass is the largest, we need to ignore those with a null .为了得到质量最大的Person object 的最大值,我们需要忽略那些massnull的人。 So we use a simple stream here to make new list of Person objects with non-null mass.因此,我们在这里使用一个简单的 stream 来创建具有非空质量的Person对象的新列表。 This stream could be replaced with a conventional loop, but would be more verbose.这个 stream 可以用传统的循环代替,但会更冗长。

With our filtered list, we call Collections.max while passing a Comparator object that compares the mass member field.使用我们的过滤列表,我们调用Collections.max ,同时传递Comparator mass成员字段的比较器 object。

We end up with a single Person object. So we interrogate for its name member field.我们最终得到一个Person object。因此我们查询其name成员字段。

Voilà, we get Grievous as our final result. Voilà,我们得到Grievous作为我们的最终结果。

Path path = Paths.get( "/Users/basil_dot_work/inputs.csv" );
if ( Files.notExists( path ) ) { throw new IllegalStateException( "Failed to find file at path: " + path ); }

List < String > lines;
try { lines = Files.lines( path , StandardCharsets.UTF_8 ).toList(); } catch ( IOException e ) { throw new RuntimeException( e ); }
lines = lines.subList( 1 , lines.size() ); // Skip over first line.

record Person( String name , Integer height , Integer mass , String hair_color , String skin_color , String eye_color , String birth_year , String gender ) { }
List < Person > persons = new ArrayList <>();
for ( String line : lines )
{
    String[] parts = line.split( ";" );

    Integer height = ( parts[ 1 ].equalsIgnoreCase( "unknown" ) ) ? null : Integer.valueOf( parts[ 1 ] );
    Integer mass = ( parts[ 2 ].equalsIgnoreCase( "unknown" ) ) ? null : Integer.valueOf( parts[ 2 ] );
    Person person = new Person( parts[ 0 ] , height , mass , parts[ 3 ] , parts[ 4 ] , parts[ 5 ] , parts[ 6 ] , parts[ 7 ] );
    persons.add( person );
}
System.out.println( "persons = " + persons );
List < Person > personsWithMass = persons.stream().filter( person -> Objects.nonNull( person.mass ) ).toList();
Person heaviestPerson = Collections.max( personsWithMass , Comparator.comparing( person -> person.mass ) );

System.out.println( "heaviest Person’s name = " + heaviestPerson.name );

heaviest Person's name = Grievous最重的人的名字 = Grievous

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM