简体   繁体   English

如何从搜索中排除可能正在使用或在python中复制的文件?

[英]How do I exclude files from a search that may be in use or being copied to in python?

I'm new to python so this might end up having a simple solution. 我是python的新手,所以这可能最终有一个简单的解决方案。

At my house, I have 3 computers that are relevant to this situation: - File Server (linux) - My main PC (windows) - Girlfriend's MacBook Pro 在我家,我有3台与此情况相关的计算机:-文件服务器(linux)-我的主计算机(windows)-女朋友的MacBook Pro

My file server is running ubuntu and samba. 我的文件服务器正在运行ubuntu和samba。 I've installed python 3.1 and I've written my code in 3.1. 我已经安装了python 3.1,并且已经在3.1中编写了代码。

I've created a daemon that determines when certain files exist in the uploads directory that follow a given pattern. 我创建了一个守护程序,该守护程序确定何时在特定目录中的特定目录中存在某些文件。 Upon finding such file, it renames it and moves it to a different location on a different drive. 找到此类文件后,它将对其重命名并将其移动到其他驱动器上的其他位置。 It also re-writes the owner, group, and permissions. 它还重写了所有者,组和权限。 All of this works great. 所有这些都很好。 It runs this process every minute. 它每分钟运行一次此过程。

If I copy files from my main pc (running a flavor of windows), the process always works. 如果我从主PC复制文件(运行Windows版本),则该过程始终有效。 (I believe windows locks the file until its done copying-- I could be wrong.) If my girlfriend copies a file, it picks up the file before the copy is complete and things get messy. (我相信Windows会锁定该文件,直到完成复制为止-我可能是错的。)如果我的女友复制了文件,则它会在复制完成之前拾取文件,从而使事情变得混乱。 (underscored versions of the files with improper permissions are created and occasionally, the file will go into the correct place) I am guessing here that her mac book does not lock the file when copying. (带有错误权限的文件的带下划线的版本会创建,并且有时文件会放到正确的位置)我在这里猜测她的mac图书在复制时不会锁定文件。 I could also be wrong there. 我在那里也可能错了。

What I need is a way to exclude files that are either in use or, failing that, are being created. 我需要的是一种排除正在使用或创建失败的文件的方法。

For reference, the method I've created to find the files is: 作为参考,我创建的用于查找文件的方法是:

# _GetFileListing(filter)
# Description: Gets a list of relevant files based on the filter
#
# Parameters: filter - a compiled regex query
# Retruns:
#   Nothing. It populates self.fileList
def _GetFileListing(self, filter):
    self.fileList = []
    for file in os.listdir(self.dir):
        filterMatch = filter.search(file)
        filepath = os.path.join(self.dir, file)

        if os.path.isfile(filepath) and filterMatch != None:
            self.fileList.append(filepath)

Note, this is all in a class. 注意,这全都在一个类中。

The method I've created to manipulate the files is: 我创建的用于处理文件的方法是:

# _ArchiveFile(filepath, outpath)
# Description: Renames/Moves the file to outpath and re-writes the file permissions to the permissions used for
#   the output directory. self.mask, self.group, and self.owner for the actual values.
#
# Parameters: filepath - path to the file
#             outpath - path to the file to output
def _ArchiveFile(self, filepath, outpath):
    dir,filename,filetype = self._SplitDirectoryAndFile(outpath)

    try:
        os.makedirs(dir, self.mask)
    except OSError:
        #Do Nothing!
        dir = dir

    uid = pwd.getpwnam(self.owner)[2]
    gid = grp.getgrnam(self.group)[2]
    #os.rename(filepath, outpath)
    shutil.move(filepath, outpath)
    os.chmod(outpath, self.mask)
    os.chown(outpath, uid, gid)

I've stopped using os.rename because it seems to have stopped working when I started moving files to different drives. 我已经停止使用os.rename,因为当我开始将文件移动到其他驱动器时,它似乎已经停止工作。

Short Version: How do I prevent myself from picking up files in my search that are currently being transferred? 简短版本:如何防止自己在搜索中拾取当前正在传输的文件?

Thank you in advance for any help you might be able to provide. 预先感谢您可能提供的任何帮助。

You can try taking an exclusive write lock on the file before moving it. 您可以在移动文件之前尝试对其进行独占写入锁定。 This can be done with the fcntl module: 这可以通过fcntl模块完成:

http://docs.python.org/library/fcntl.html http://docs.python.org/library/fcntl.html

Barring that, you can us the lsof utility to see files which the system has open. 除非如此, lsof您可以使用lsof实用程序查看系统已打开的文件。 That requires more drudgery. 这需要更多的工作。

Note that os.rename() will work on the same filesystem, and would actually be immune to this issue (the inode gets moved, no data gets moved). 请注意,os.rename()将在同一文件系统上工作,并且实际上不受此问题的影响(inode被移动,无数据被移动)。 Using shutil will do as mv does, which is either relink the file if its the same filesystem, or copy + delete if the filesystems are different. 使用shutil就像mv一样,如果文件是相同的文件系统,则重新链接文件;如果文件系统不同,则复制+删除。

Turns out the write lock approach didn't work. 原来写锁定方法不起作用。 I guess I didn't properly test it before updating here. 我猜我在更新之前没有正确测试它。

What I've decided to do for now is: 我现在决定要做的是:

  • Reduce the time between checks to 30s 将检查之间的时间减少到30秒
  • Keep a list of files found in the previous iteration and their respective file sizes 保留上一次迭代中找到的文件及其各自的文件大小的列表
  • Check the new list of files against the old list 根据旧列表检查新文件列表

If the new list contains the same file with the same file size as the old list, put it in a list to be transferred. 如果新列表包含的文件大小与旧列表相同,则将其放在要传输的列表中。 The remaining files in the new list become the old list and the process continues. 新列表中的其余文件将成为旧列表,并且该过程将继续。

I'm sure the lsof method will work but I'm not sure how to use it in python. 我确定lsof方法会起作用,但是我不确定如何在python中使用它。 Also this method should work quite well for my situation since I am mostly concerned with not moving the files while they're in transit. 同样,此方法应适合我的情况,因为我最担心的是在传输文件时不要移动文件。

I would also have to exclude all files that start with "._" since the mac creates those and I'm not sure if they increase in size over time. 我还必须排除所有以“ ._”开头的文件,因为mac会创建这些文件,而且我不确定它们的大小是否会随时间增加。

Alternatively, I have the option to handle just cases where it's being transferred by her mac. 另外,我可以选择仅处理其Mac传输的情况。 I know that when the mac is transferring the file, it creates: 我知道在Mac传输文件时,它会创建:

  • filename.ext filename.ext
  • ._filename.ext ._filename.ext

I could check the list for all instances of filename where it is preceded with ._ and exclude files that way. 我可以检查列表中所有以._开头的文件名的所有实例,并以此方式排除文件。

I'll probably try the second option first. 我可能会首先尝试第二种选择。 It's a little dirty but hopefully it will work. 有点脏,但希望它能工作。

The ._ files from the mac contain resource forks. Mac中的._文件包含资源派生。 More information can be found here: http://support.apple.com/kb/TA20578 可以在这里找到更多信息: http : //support.apple.com/kb/TA20578

I don't have enough rep to make a comment, hence the answer. 我的代表没有足够的意见,因此没有答案。

For the most part you can safely ignore them, as no other OS can probably do anything with them anyway. 在大多数情况下,您可以放心地忽略它们,因为其他操作系统可能无法对它们执行任何操作。 More info on them here: http://en.wikipedia.org/wiki/Resource_fork 有关它们的更多信息,请参见: http : //en.wikipedia.org/wiki/Resource_fork

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM