[英]Writing Unique lines to a File Using Java
I have written some code that is almost there in terms of how I want it to function. 我已经编写了一些代码,这些代码几乎可以实现其功能。 The logic of this Java code is as follows:
此Java代码的逻辑如下:
And here is the code: 这是代码:
package preproc;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Preproc {
public static void main(String[] args) {
File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
BufferedReader br;
HashMap<String, Integer> hmap = new HashMap<>();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int linenumber = 0;
String event;
while ((event = br.readLine()) != null) {
//System.out.println("LINE=" + event);
Pattern regex = Pattern.compile("^.*url=(.*)");
Matcher check = regex.matcher(event);
if (check.find()) {
String match = check.group(1);
//System.out.println("GROUP=" + match + " LINE=" + linenumber);
if (!hmap.containsKey(match)) {
//System.out.println("ADDING TO INDEX");
hmap.put(match, linenumber);
}
}
linenumber++;
}
List<Integer> lineNumbers = new ArrayList<>(hmap.values());
//System.out.println("SIZE=" + lineNumbers.size());
Collections.sort(lineNumbers);
File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));
int currentLine = 0;
for (Integer line : lineNumbers) {
//System.out.println("LINE=" + line + "CURRENT LINE=" + currentLine);
while (currentLine < line) {
reader.readLine();
currentLine++;
}
writer.write(reader.readLine());
writer.newLine();
currentLine++;
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The issue I'm facing is that it is writing ALL of the unique string matches to the HashMap, when I only want to add those that occur once in the original file. 我要面对的问题是,当我只想添加在原始文件中只出现一次的匹配字符串时,它将所有唯一的字符串匹配写入HashMap。 IE five instances of site1.com and one instance of site2.com, the map will have the first instance of site1.com and the unique instance of site2.com.
在IE中有五个site1.com实例和一个site2.com实例,该地图将具有site1.com的第一个实例和site2.com的唯一实例。 I only would want site2.com.
我只想要site2.com。
All help is greatly appreciated. 非常感谢所有帮助。
Create a Map<String, Occurrence>
where Occurrence
contains the (first) line number, and the number of occurrences of the URL. 创建一个
Map<String, Occurrence>
,其中Occurrence
包含(第一)行号和URL的出现次数。 When writing, ignore the lines for wich the number of occurrences is > 1. 写入时,请忽略出现次数大于1的行。
That's one way, there are others. 那是一种方式,还有其他方式。
You could have a Set
of urls that are met at least twice. 您可能拥有至少两次被满足的一
Set
网址。 As soon as you find a URL that is already in the map, you add it to the set. 一旦找到地图中已经存在的URL,就将其添加到集合中。 When writing, you ignore the URLs that are in the set.
编写时,您将忽略集合中的URL。
Note that, if the file isn't too large, you could store the lines in memory rather than re-reading the file. 请注意,如果文件不是太大,则可以将行存储在内存中,而不用重新读取文件。
package preproc;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Preproc {
public static void main(String[] args) {
File file = new File("C:\\Users\\AnthonyH\\Desktop\\file.txt");
BufferedReader br;
HashMap<String, List<Integer>> hmap = new LinkedHashMap<String, List<Integer>>();
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int linenumber = 0;
String event;
while ((event = br.readLine()) != null) {
Pattern regex = Pattern.compile("^.*url=(.*)");
Matcher check = regex.matcher(event);
if (check.find()) {
String match = check.group(1);
List<Integer> lineNumbers = new ArrayList<Integer>();
if (hmap.containsKey(match)) {
lineNumbers = hmap.get(match);
}
lineNumbers.add(linenumber);
hmap.put(match, lineNumbers);
}
linenumber++;
}
List<List<Integer>> lineNumbers = new ArrayList<List<Integer>>(hmap.values());
File file2 = new File("C:\\Users\\AnthonyH\\Desktop\\file2.txt");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file2)));
for (List<Integer> linesOccurences : lineNumbers) {
int currentLine = 0;
if(linesOccurences.size() == 1)
{
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
int line = linesOccurences.get(1);
while (currentLine++ < line) {
reader.readLine();
}
writer.write(reader.readLine());
writer.newLine();
reader.close();
}
}
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Try this edited code. 试试这个编辑后的代码。 In the previous one the BufferedReader object was not at a correct place.
在上一个中,BufferedReader对象不在正确的位置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.