简体   繁体   English

File.lastModified()痛苦地慢!

[英]File.lastModified() painfully slow!

I'm doing a recursive copy of files and like xcopy /D I only want to copy newer files destination files (I cannot use xcopy directly since I need to alter some files in the copy process). 我正在做一个文件的递归副本,像xcopy /D我只想复制更新的文件目标文件(我不能直接使用xcopy ,因为我需要在复制过程中更改一些文件)。

In java I use lastModified() to check if the destination file is older than the source file and it's very slow. 在java中,我使用lastModified()来检查目标文件是否比源文件旧,而且速度很慢。

  • Can I speed up the process (maybe using JNI??)? 我可以加快这个过程(可能使用JNI ??)?
  • Are there any other copy scripts that can do the job better (copy new files + regexp change some text files)? 是否有任何其他复制脚本可以更好地完成工作(复制新文件+正则表达式更改一些文本文件)?

Copying files anyways is not an option since that will take more time than checking last modified date (copying over the network). 无论如何都不能复制文件,因为这将比检查上次修改日期(通过网络复制)花费更多时间。

You need to determine why it is so slow. 你需要确定它为什么这么慢。

When you are running the progrma what is the CPU utilisation of your process. 当您运行progrma时,您的进程的CPU利用率是多少。 If it more than 50% user, then you should be able to optmise your program, if its less than 20% there isn't so much you can do. 如果它超过50%的用户,那么你应该能够选择你的程序,如果它低于20%你没有那么多你可以做。

Usually this method is slow because the file you are examining is on disk rather than in memory. 通常这种方法很慢,因为您正在检查的文件是在磁盘上而不是在内存中。 If this is the case you need to speed up how you access your disk, or get a faster drive. 如果是这种情况,您需要加快访问磁盘的速度,或者获得更快的驱动器。 eg SSD can be 10-100x faster at doing this. 例如,SSD可以快10到100倍。

A bulk query might help. 批量查询可能有所帮助。 You can do this by using multiple threads to check the lastModified date. 您可以使用多个线程来检查lastModified日期。 eg have a fixed size thread pool and add a task for each file. 例如,拥有固定大小的线程池并为每个文件添加任务。 The size of the thread pool determines the number of files polled at once. 线程池的大小决定了一次轮询的文件数。

This allows the OS to re-order the requests to suit the layout on the disk. 这允许操作系统重新排序请求以适应磁盘上的布局。 Note: This is fine in theory, but you have to test whether this makes things faster on your OS/hardware as its just as likely to make things slower. 注意:这在理论上很好,但你必须测试这是否会使你的操作系统/硬件上的速度更快,因为它可能会使速度变慢。 ;) ;)

So I ran across this on network drives. 所以我在网络驱动器上遇到了这个问题。 Painful. 痛苦。 I had a directory with 17000+ files on it. 我有一个包含17000多个文件的目录。 On a local drive it took less than 2 seconds to check the last modified date. 在本地驱动器上,检查上次修改日期的时间不到2秒。 On a networked drive it took 58 seconds!!! 在网络驱动器上花了58秒! Of course my app is an interactive app so I had some complaints. 当然我的应用程序是一个交互式应用程序,所以我有一些投诉。

After some research I decided that it would be possible to implement some JNI code to do the Windows Kernel32 findfirstfile/findnextfile/findclose to dramatically improve the process but then I had 32 and 64 bit version etc. ugh. 经过一些研究后,我决定可以实现一些JNI代码来执行Windows Kernel32 findfirstfile / findnextfile / findclose来显着改进过程,但后来我有32位和64位版本等等。 and then lose the cross platform capabilities. 然后失去跨平台功能。

Although a bit of a nasty hack here is what I did. 虽然这里有点讨厌的黑客是我做的。 My app operates on windows mostly but I didn't want to restrict it to do so so I did the following. 我的应用程序主要在Windows上运行,但我不想限制它这样做,所以我做了以下。 Check to see if I am operating on windows. 检查我是否在Windows上运行。 If so then see if I am using a local hard disk. 如果是,那么看看我是否使用本地硬盘。 If not then we are going to do the hackish method. 如果没有,那么我们将采用hackish方法。

I stored everything case insensitive. 我存储的所有内容都不区分大小写。 Probably not a great idea for other OS's that may have a directory with both files 'ABC' and 'abc'. 对于可能有两个文件'ABC'和'abc'的目录的其他操作系统可能不是一个好主意。 If you need to care about this then you can decide by creating a new File("ABC") and new File("abc") and then using the equals method to compare them. 如果你需要关心这个,那么你可以通过创建一个新文件(“ABC”)和新文件(“abc”),然后使用equals方法来比较它们。 On case insensitive file systems like windows it will return true but on unix systems it will return false. 对于像Windows这样的不区分大小写的文件系统,它将返回true,但在unix系统上它将返回false。

Although it may be a little hackish the time it took went from 58 seconds to 1.6 seconds on a network drive so I can live with the hack. 虽然在网络驱动器上花费的时间从58秒变为1.6秒可能有点过时,但我可以忍受黑客入侵。

        boolean useJaveDefaultMethod = true;

    if(System.getProperty("os.name").startsWith("Windows"))
    {
        File f2 = f.getParentFile();
        while(true)
        {
            if(f2.getParentFile() == null)
            {
                String s = FileSystemView.getFileSystemView().getSystemTypeDescription(f2);
                if(FileSystemView.getFileSystemView().isDrive(f2) && "Local Disk".equalsIgnoreCase(s))
                {
                    useJaveDefaultMethod = true;
                }
                else
                {
                    useJaveDefaultMethod = false;
                }
                break;
            }
            f2 = f2.getParentFile();
        }
    }
    if(!useJaveDefaultMethod)
    {
        try
        {
            ProcessBuilder pb = new ProcessBuilder("cmd.exe", "/C", "dir " + f.getParent());
            pb.redirectErrorStream(true);
            Process process = pb.start();
            InputStreamReader isr = new InputStreamReader(process.getInputStream());
            BufferedReader br = new BufferedReader(isr);

            String line;
            DateFormat df = new SimpleDateFormat("dd-MMM-yy hh:mm a");
            while((line = br.readLine()) != null)
            {
                try
                {
                    Date filedate = df.parse(line);
                    String filename = line.substring(38);
                    dirCache.put(filename.toLowerCase(), filedate.getTime());
                }
                catch(Exception ex)
                {

                }
            }
            process.waitFor();

            Long filetime = dirCache.get(f.getName().toLowerCase());
            if(filetime != null)
                return filetime;

        }
        catch(Exception Exception)
        {
        }
    }

    // this is SO SLOW on a networked drive!
    long lastModifiedDate = f.lastModified();
    dirCache.put(f.getName().toLowerCase(), lastModifiedDate);

    return lastModifiedDate;

Unfortunately the way Java handles looking up lastModified is slow (basically it queries the underlying file system for each file as you request the information, there is no bulk loading of this data on listFiles or similar). 不幸的是,Java处理查找lastModified的方式很慢(基本上它在您请求信息时查询每个文件的基础文件系统,在listFiles或类似文件上没有批量加载此数据)。

You could potentially invoke a more efficient native program to do this in bulk, but any such solution would be closely tied to the platform you deploy to. 您可以调用更高效的本机程序来批量执行此操作,但任何此类解决方案都将与您部署到的平台紧密相关。

I imagine you are doing this over the network, otherwise there would be little point in the copy. 我想你是通过网络做到这一点,否则副本中就没什么意义了。 Network directory operations are slow, bad luck. 网络目录操作很慢,运气不好。 You could always just copy the file below a certain size threshold, whatever makes the total operation take least time. 您总是可以将文件复制到特定大小阈值以下,无论是什么使总操作花费的时间最少。

I disagree with Kris here: there's nothing startlingly inefficient in the way Java does it, and in any case it really has to do it that way because you want the latest value. 我在这里不同意Kris:在Java的方式上没有什么效率低得惊人,无论如何它真的必须这样做,因为你想要最新的价值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM