简体   繁体   English

Java不会遍历大目录中的所有文件

[英]Java doesn't iterate through all files in a big directory

I'm doing some data mining for the first time using the enron email dataset. 我第一次使用enron电子邮件数据集进行一些数据挖掘。 I'm trying to iterate through every file in a directory and parse into a csv file the date, time and addressor of every file. 我正在尝试遍历目录中的每个文件,并将每个文件的日期,时间和地址解析为一个csv文件。

The problem is that java doesn't seem to iterate through all of them, which is why my csv file is around 1000 lines too short. 问题是java似乎没有遍历所有这些,这就是为什么我的csv文件短了大约1000行的原因。 How can I solve this? 我该如何解决?

My code: 我的代码:

public class FileReader {


    public static void main(String[] args) throws FileNotFoundException{
    FileReader fileReader = new FileReader();

    //fileReader.mainFunction("maildir/skilling-j/_sent_mail");
    fileReader.mainFunction("maildir/skilling-j/inbox");
    /*fileReader.mainFunction("maildir/skilling-j/sent");
    fileReader.mainFunction("maildir/lay-k/inbox");
    fileReader.mainFunction("maildir/lay-k/_sent");
    fileReader.mainFunction("maildir/lay-k/sent");*/
    System.out.println("done!");

    }
    public void mainFunction(String fileName) throws FileNotFoundException{
    File maindir = new File(fileName);
    PrintWriter pw = new PrintWriter(new File("Analysis.csv"));
    StringBuilder sb = new StringBuilder();
    StringBuilder sbpre = new StringBuilder();

    Scanner scanner;
    sbpre.append("Date");
    sbpre.append(',');
    sbpre.append("Time");
    sbpre.append(",");
    sbpre.append("From");
    sbpre.append('\n');
    int endcounter = 0;
    pw.write(sbpre.toString());
    File [] files = maindir.listFiles();
        for(int i = 0; i < files.length; i++){
            scanner = new Scanner(files[i]);
            System.out.println(files[i].getPath());
            while (scanner.hasNextLine()) {
                String lineFromFile = scanner.nextLine();
                String month = "Jun";
                String year = "2000";
                String time = "00:00:00";
                if(lineFromFile.contains("Date:") & (lineFromFile.length()== 43 | lineFromFile.length()== 42 )){
                    if(lineFromFile.length()==43){
                        sb.append(lineFromFile.substring(11,13));
                        month = lineFromFile.substring(14, 17); 
                        year = lineFromFile.substring(18,22);
                        time = lineFromFile.substring(23,30);
                    }else{
                        sb.append("0");
                        sb.append(lineFromFile.substring(11,12)); 
                        month = lineFromFile.substring(13, 16);
                        year = lineFromFile.substring(17,21);   
                        time = lineFromFile.substring(22,29);
                                            }                   
                    sb.append(".");

                    switch(month){
                    case "Jan":sb.append("01"); sb.append(".");break;
                    case "Feb":sb.append("02"); sb.append(".");break;
                    case "Mar":sb.append("03"); sb.append(".");break;
                    case "Apr":sb.append("04"); sb.append(".");break;
                    case "May":sb.append("05"); sb.append(".");break;
                    case "Jun":sb.append("06"); sb.append(".");break;
                    case "Jul":sb.append("07"); sb.append(".");break;
                    case "Aug":sb.append("08"); sb.append(".");break;
                    case "Sep":sb.append("09"); sb.append(".");break;
                    case "Oct":sb.append("10"); sb.append(".");break;
                    case "Nov":sb.append("11"); sb.append(".");break;
                    case "Dec":sb.append("12"); sb.append(".");break;
                    }
                    sb.append(year);
                    sb.append(",");
                    sb.append(time);
                    sb.append(",");


            }

                if(lineFromFile.contains("X-From:")) {
                        lineFromFile = lineFromFile.replace(",", " ");
                        sb.append(lineFromFile.substring(8));

                    }

            pw.write(sb.toString());
            sb.setLength(0);
        }
            sb.append('\n');
            endcounter = i;
    }
        pw.close();
        System.out.println(endcounter);
    }
}

Console log last lines: 控制台日志的最后几行:

maildir\skilling-j\inbox\997_
maildir\skilling-j\inbox\998_
maildir\skilling-j\inbox\999_
maildir\skilling-j\inbox\99_
maildir\skilling-j\inbox\9_
1251
done!

It should be actually around 2500 lines. 实际上应该在2500行左右。

Also would be nice to know how I can iterate through a directory with directories (eg "maildir/skilling-j") instead of a single directory with files. 知道我如何遍历具有目录的目录(例如“ maildir / skilling-j”)而不是具有文件的单个目录也将是一件很高兴的事情。

And I know that the code is kind of bloated but that's the result of an incompetent coder (me). 而且我知道代码有点肿,但这是编码器(me)功能不强的结果。

listFiles() method returns list of files and directories. listFiles()方法返回文件和目录的列表。 You could use methods isFile(), isDirectory() to identify type of file. 您可以使用isFile(),isDirectory()方法来识别文件类型。 Try this simple code to verify files in your folder: 尝试使用以下简单代码来验证文件夹中的文件:

    File[] files = maindir.listFiles();
    System.out.println("Files count: " + files.length);
    for (int i = 0; i < files.length; i++) {
        System.out.print(files[i].getAbsolutePath());
        if (files[i].isDirectory()) {
            System.out.println(" dir");
        } else if (files[i].isFile()) {
            System.out.println(" file");
        }
    }

You could use isDirectory() method to filter only directories and iterate throw them. 您可以使用isDirectory()方法来仅过滤目录并对其进行迭代。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM