简体   繁体   English

如何对 Java 中的文本文件中的引用字符串进行排序

[英]How to sort quoted strings from a text file in Java

I am trying to read list of quoted strings eg我正在尝试读取引用字符串的列表,例如

"GJKFMN","OUYTV","VFRN","APLUI","DCFUYT","DXSER","JHGF","PIUYT","XSQ" 

from a text file and sort the words on alphabetical order.从文本文件中按字母顺序对单词进行排序。 I also want to score each of these words in form of say A=1, B=2,... and sum the alphabets of each word.我还想以A=1, B=2,...的形式对这些单词中的每一个进行评分,并对每个单词的字母表求和。

I have tried this code below for the sorting but it's not sorting it for me:我已经尝试了下面的代码进行排序,但它没有为我排序:

public static void main(String[] args){
    String filePath = null;
    if (args[0] == null || args[0].isEmpty()) {
        System.out.println("Please Enter the Names File Path Enclosed in Double Quotes");
    }
    else {
        filePath = args[0];
    }
    List<String> bufferList = loadDataUsingBufferReader(filePath);
    List<String> listWithoutQuotes = removeQuotes(bufferList);
    listWithoutQuotes.parallelStream().map(String::toUpperCase).sorted().forEach(System.out::println);
}
public static List<String> removeQuotes(List<String> listWithQoutes) {
    listWithQoutes = listWithQoutes.stream().map(s -> s.replaceAll("\"", "")).collect(Collectors.toList());
    return listWithQoutes;
}
public static List<String> loadDataUsingBufferReader(String filePath) {
    final Charset ENCODING = StandardCharsets.UTF_8;
    List<String> lines = new LinkedList<>();
    try {
        final BufferedReader in = new BufferedReader(
                new InputStreamReader(new FileInputStream(filePath), ENCODING));
        String line;
        while ((line = in.readLine()) != null) {
            lines.add(line);
        }
        in.close();
    } catch (final IOException e) {
        e.printStackTrace();
    }
    return lines;
}

In the code I'm reading the file path from command line.在代码中,我正在从命令行读取文件路径。 When I hard code the input it sorts it but when I read from a file it doesn't.当我对输入进行硬编码时,它会对其进行排序,但是当我从文件中读取时,它不会。 Performance is a key factor as the file could be as large as containing millions of words.性能是一个关键因素,因为文件可能包含数百万字。

Thanks in advance for your help...在此先感谢您的帮助...

Using the following test data, which you just can copy-paste into a text file and use it as a sample file使用以下测试数据,您只需将其复制粘贴到文本文件中并将其用作示例文件

"DSRD","KJHT","BFXXX","OUYTP"
"ABCD","XSHTKK","RTZI","HKLOPQ"
"BGTSZ","ASY","LOMCV","DESRAW"
"VMWEE","ERTZU","GSDFX","BHGFD"
"CD","FRTZU","JUHL","RETZ"

Something like below should work.像下面这样的东西应该可以工作。 I hope the method names are self explanatory and it is clear what happens in each step.我希望方法名称是不言自明的,并且很清楚每个步骤会发生什么。 I have included some println statements as a little debugging help.我已经包含了一些 println 语句作为调试帮助。 You should remove them if you are working with your original files which are possibly very large.如果您正在处理可能非常大的原始文件,则应该删除它们。

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class Example {

    public static void main(String args[]) throws IOException {
        String filePath = null;
        if (args[0] == null || args[0].isEmpty()) {
            System.out.println("Please Enter the Names File Path Enclosed in Double Quotes");
        }
        else {
            filePath = args[0];
        }

        List<String> allLines = readAllLinesFromFile(filePath);
        allLines.forEach(System.out::println);
        System.out.println("**********************");

        List<String> listWithoutQuotes = removeQuotes(allLines);
        listWithoutQuotes.forEach(System.out::println);
        System.out.println("*****************");

        List<String> allWords = getAllWordsFromEachLineSorted(listWithoutQuotes);
        System.out.println(allWords);
        System.out.println("****************");

        List<Integer> scores = calculateStoreForAList(allWords);
        System.out.println(scores);
    }
    static List<String> readAllLinesFromFile(String fileName) throws IOException{
        return Files.readAllLines(Paths.get(fileName));
    }
    public static List<String> removeQuotes(List<String> listWithQoutes) {
        return listWithQoutes.stream()
                .map(s -> s.replaceAll("\"", ""))
                .collect(Collectors.toList());
    }
    public static List<String> getAllWordsFromEachLineSorted(List<String> lines) {
        return lines.stream()
                .map(s -> s.split("\\s*,\\s*"))
                .flatMap(Arrays::stream)
                .sorted()
                .collect(Collectors.toList());
    }

    static int calculateScore(String word){
        return word.chars()
                .map(i -> i-64)
                .sum();
    }
    static List<Integer> calculateStoreForAList(List<String> allWords){
        return allWords.stream()
                .map(str -> calculateScore(str))
                .collect(Collectors.toList());
    }
}

You should see something similar to你应该看到类似的东西

"DSRD","KJHT","BFXXX","OUYTP"
"ABCD","XSHTKK","RTZI","HKLOPQ"
"BGTSZ","ASY","LOMCV","DESRAW"
"VMWEE","ERTZU","GSDFX","BHGFD"
"CD","FRTZU","JUHL","RETZ"
**********************
DSRD,KJHT,BFXXX,OUYTP
ABCD,XSHTKK,RTZI,HKLOPQ
BGTSZ,ASY,LOMCV,DESRAW
VMWEE,ERTZU,GSDFX,BHGFD
CD,FRTZU,JUHL,RETZ
*****************
[ABCD, ASY, BFXXX, BGTSZ, BHGFD, CD, DESRAW, DSRD, ERTZU, FRTZU, GSDFX, HKLOPQ, JUHL, KJHT, LOMCV, OUYTP, RETZ, RTZI, VMWEE, XSHTKK]
****************
[10, 45, 80, 74, 27, 7, 70, 45, 90, 91, 60, 79, 51, 49, 65, 97, 69, 73, 68, 93]

After you removed double quotes from your text file, I would go with following steps;从文本文件中删除双引号后,我将按照以下步骤 go;

Reading whole file as one string:将整个文件作为一个字符串读取:

Path path = FileSystems.getDefault().getPath(directory, filename);
String fileContent = new String(Files.readAllBytes(path), StandardCharsets.UTF_8);

Split the content into words since you have standard delimiter comma:将内容拆分为单词,因为您有标准分隔符逗号:

String[] words = fileContent.split(",");

Then sort it by using Arrays class built-in method:然后使用 Arrays class 内置方法对其进行排序:

Arrays.sort(words);

To calculate each word's score: capital "A" ascii decimal value is 65, so if you subtract 64 from each letters' ascii decimal value, you will find the score.计算每个单词的分数:大写的“A”ASCII十进制值是65,所以如果你从每个字母的ASCII十进制值中减去64,你就会得到分数。 For example:例如:

String abc = "ABC";
int sum = 0;

for (int i = 0; i < abc.length(); ++i){
    sum += (int) abc.charAt(i) - 64;
} 

Here sum value is 6.这里sum值为 6。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM