简体   繁体   English

使用Java将唯一行写入文件

[英]Writing Unique lines to a File Using Java

I have written some code that is almost there in terms of how I want it to function. 我已经编写了一些代码,这些代码几乎可以实现其功能。 The logic of this Java code is as follows: 此Java代码的逻辑如下:

  1. Read the source file from the specified location 从指定位置读取源文件
  2. As we're reading each line, apply the regex to get the capture group result (in this instance, the URL) 在阅读每一行时,应用正则表达式以获取捕获组结果(在本例中为URL)
  3. After all these lines are read, put the URL and line number in to the HashMap 读取所有这些行之后,将URL和行号放入HashMap中
  4. Copy these values into a list, and order them by line number increasing 将这些值复制到列表中,并按行号递增的顺序进行排序
  5. Read the source file again 再次读取源文件
  6. For each line number matched in the list, write to our new file 对于列表中匹配的每个行号,写入我们的新文件

And here is the code: 这是代码:

package preproc;

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Preproc {

    public static void main(String[] args) {

       File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
       BufferedReader br;

       HashMap<String, Integer> hmap = new HashMap<>();

        try {

            br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

            int linenumber = 0;
            String event;

            while ((event = br.readLine()) != null) {

        //System.out.println("LINE=" + event);
                Pattern regex = Pattern.compile("^.*url=(.*)");
                Matcher check = regex.matcher(event);
                if (check.find()) {
                    String match = check.group(1);
            //System.out.println("GROUP=" + match + " LINE=" + linenumber);
                    if (!hmap.containsKey(match)) {
            //System.out.println("ADDING TO INDEX");
                        hmap.put(match, linenumber);
                    }
                }

                linenumber++;
            }

            List<Integer> lineNumbers = new ArrayList<>(hmap.values());
        //System.out.println("SIZE=" + lineNumbers.size());
            Collections.sort(lineNumbers);

            File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
            BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));

            int currentLine = 0;

            for (Integer line : lineNumbers) {

        //System.out.println("LINE=" + line + "CURRENT LINE=" + currentLine);
                while (currentLine < line) {
                    reader.readLine();
                    currentLine++;
                }
                writer.write(reader.readLine());
        writer.newLine();
                currentLine++;
            }

        writer.close();

        } catch (IOException e) {

            e.printStackTrace();

        }

    }

}

The issue I'm facing is that it is writing ALL of the unique string matches to the HashMap, when I only want to add those that occur once in the original file. 我要面对的问题是,当我只想添加在原始文件中只出现一次的匹配字符串时,它将所有唯一的字符串匹配写入HashMap。 IE five instances of site1.com and one instance of site2.com, the map will have the first instance of site1.com and the unique instance of site2.com. 在IE中有五个site1.com实例和一个site2.com实例,该地图将具有site1.com的第一个实例和site2.com的唯一实例。 I only would want site2.com. 我只想要site2.com。

All help is greatly appreciated. 非常感谢所有帮助。

Create a Map<String, Occurrence> where Occurrence contains the (first) line number, and the number of occurrences of the URL. 创建一个Map<String, Occurrence> ,其中Occurrence包含(第一)行号和URL的出现次数。 When writing, ignore the lines for wich the number of occurrences is > 1. 写入时,请忽略出现次数大于1的行。

That's one way, there are others. 那是一种方式,还有其他方式。

You could have a Set of urls that are met at least twice. 您可能拥有至少两次被满足的一Set网址。 As soon as you find a URL that is already in the map, you add it to the set. 一旦找到地图中已经存在的URL,就将其添加到集合中。 When writing, you ignore the URLs that are in the set. 编写时,您将忽略集合中的URL。

Note that, if the file isn't too large, you could store the lines in memory rather than re-reading the file. 请注意,如果文件不是太大,则可以将行存储在内存中,而不用重新读取文件。

package preproc;

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Preproc {

public static void main(String[] args) {

   File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
   BufferedReader br;

   HashMap<String, List<Integer>> hmap = new LinkedHashMap<String, List<Integer>>();

    try {

        br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));

        int linenumber = 0;
        String event;

        while ((event = br.readLine()) != null) {

            Pattern regex = Pattern.compile("^.*url=(.*)");
            Matcher check = regex.matcher(event);
            if (check.find()) {
                String match = check.group(1);

                List<Integer> lineNumbers = new ArrayList<Integer>();
                if (hmap.containsKey(match)) {
                    lineNumbers = hmap.get(match);
                }
                lineNumbers.add(linenumber);

                    hmap.put(match, lineNumbers);
            }

            linenumber++;
        }

        List<List<Integer>> lineNumbers = new ArrayList<List<Integer>>(hmap.values());

        File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");

        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));



        for (List<Integer> linesOccurences : lineNumbers) {

            int currentLine = 0;
            if(linesOccurences.size() == 1)
            {
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
                int line = linesOccurences.get(1);
                while (currentLine++ < line) {
                    reader.readLine();
                }
                writer.write(reader.readLine());
                writer.newLine();
                reader.close();
            }

        }

    writer.close();

    } catch (IOException e) {

        e.printStackTrace();

    }

}
}

Try this edited code. 试试这个编辑后的代码。 In the previous one the BufferedReader object was not at a correct place. 在上一个中,BufferedReader对象不在正确的位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM