简体   繁体   English

无法在Windows上使用Python查找具有长名称的文件

[英]Unable to locate files with long names on Windows with Python

I need to walk through folders with long file names in Windows. 我需要在Windows中浏览具有长文件名的文件夹。

I tried using os.listdir() , but it crashes with long pathnames, which is bad. 我尝试使用os.listdir() ,但它崩溃了很长的路径名,这很糟糕。

I tried using os.walk() , but it ignores the pathnames longer than ~256, which is worse. 我尝试使用os.walk() ,但它忽略了长于~256的路径名,这更糟糕。

I tried the magic word workaround described here , but it only works with mapped drives, not with UNC pathnames . 我尝试了这里描述的魔术解决方法,但它只适用于映射驱动器,而不适用于UNC路径名

Here is an example with short pathnames, that shows that UNC pathnames don't work with the magic word trick. 下面是一个使用短路径名的示例,它显示UNC路径名不适用于魔术词诀。

>>> os.listdir('c:\\drivers')
['nusb3hub.cat', 'nusb3hub.inf', 'nusb3hub.sys', 'nusb3xhc.cat', 'nusb3xhc.inf', 'nusb3xhc.sys']
>>> os.listdir('\\\\Uni-hq-srv6\\router')
['2009-04-0210', '2010-11-0909', ... ]

>>> mw=u'\\\\?\\'
>>> os.listdir(mw+'c:\\drivers')
[u'nusb3hub.cat', u'nusb3hub.inf', u'nusb3hub.sys', u'nusb3xhc.cat', u'nusb3xhc.inf', u'nusb3xhc.sys']
>>> os.listdir(mw+'\\\\Uni-hq-srv6\\router')

Traceback (most recent call last):
  File "<pyshell#160>", line 1, in <module>
    os.listdir(mw+'\\\\Uni-hq-srv6\\router')
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'\\\\?\\\\\\Uni-hq-srv6\\router\\*.*'

Any idea on how to deal with long pathnames or with unicode UNC pathnames? 关于如何处理长路径名或使用unicode UNC路径名的任何想法?

Edit: 编辑:

Following the suggestion of the comments below, I created some test functions to compare Python 2.7 and 3.3, and I added the test of glob.glob and os.listdir after os.chdir . 根据以下评论的建议,我创建了一些测试函数来比较Python 2.7和3.3,并在os.chdir之后添加了glob.globos.listdir的测试。

The os.chdir didn't help as expected (see this comment ). os.chdir并没有像预期的那样有所帮助(见本评论 )。

The glob.glob is the only one that in Python 3.3 works better, but only in one condition: using the magic word and with the drive name. glob.glob是Python 3.3中唯一能够更好地工作的,但仅限于一种情况:使用魔术字和驱动器名称。

Here is the code I used (it works on both 2.7 and 3.3). 这是我使用的代码(它适用于2.7和3.3)。 I am learning Python now, and I hope these tests make sense: 我现在正在学习Python,我希望这些测试有意义:

from __future__ import print_function
import os, glob

mw = u'\\\\?\\'

def walk(root):
    n = 0
    for root, dirs, files in os.walk(root):
        n += len(files)
    return n

def walk_mw(root):
    n = 0
    for root, dirs, files in os.walk(mw + root):
        n += len(files)
    return n

def listdir(root):
    try:
        folders = [f for f in os.listdir(root) if os.path.isdir(os.path.join(root, f))]
        files = [f for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))]
        n = len(files)
        for f in folders:
            n += listdir(os.path.join(root, f))
        return n
    except:
        return 'Crash'

def listdir_mw(root):
    if not root.startswith(mw):
        root = mw + root
    try:
        folders = [f for f in os.listdir(root) if os.path.isdir(os.path.join(root, f))]
        files = [f for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))]
        n = len(files)
        for f in folders:
            n += listdir_mw(os.path.join(root, f))
        return n
    except:
        return 'Crash'

def listdir_cd(root):
    try:
        os.chdir(root)
        folders = [f for f in os.listdir('.') if os.path.isdir(os.path.join(f))]
        files = [f for f in os.listdir('.') if os.path.isfile(os.path.join(f))]
        n = len(files)
        for f in folders:
            n += listdir_cd(f)
        return n
    except:
        return 'Crash'

def listdir_mw_cd(root):
    if not root.startswith(mw):
        root = mw + root
    try:
        os.chdir(root)
        folders = [f for f in os.listdir('.') if os.path.isdir(os.path.join(f))]
        files = [f for f in os.listdir('.') if os.path.isfile(os.path.join(f))]
        n = len(files)
        for f in folders:
            n += listdir_cd(f) # the magic word can only be added the first time
        return n
    except:
        return 'Crash'

def glb(root):
    folders = [f for f in glob.glob(root + '\\*') if os.path.isdir(os.path.join(root, f))]
    files = [f for f in glob.glob(root + '\\*') if os.path.isfile(os.path.join(root, f))]
    n = len(files)
    for f in folders:
        n += glb(os.path.join(root, f))
    return n

def glb_mw(root):
    if not root.startswith(mw):
        root = mw + root
    folders = [f for f in glob.glob(root + '\\*') if os.path.isdir(os.path.join(root, f))]
    files = [f for f in glob.glob(root + '\\*') if os.path.isfile(os.path.join(root, f))]
    n = len(files)
    for f in folders:
        n += glb_mw(os.path.join(root, f))
    return n

def test():
    for txt1, root in [('drive ', r'C:\test'),
                    ('UNC   ', r'\\Uni-hq-srv6\router\test')]:
        for txt2, func in [('walk                    ', walk),
                           ('walk     magic word     ', walk_mw),
                           ('listdir                 ', listdir),
                           ('listdir  magic word     ', listdir_mw),
                           ('listdir              cd ', listdir_cd),
                           ('listdir  magic word  cd ', listdir_mw_cd),
                           ('glob                    ', glb),
                           ('glob     magic word     ', glb_mw)]:
            print(txt1, txt2, func(root))

test()

And here is the result: 这是结果:

  • The number 8 means all the files were found 数字8表示找到所有文件
  • The number 0 means it didn't even try without crashing 数字0意味着它甚至没有崩溃的尝试
  • Any number between 1 and 7 means it failed half way without crashing 1到7之间的任何数字表示它在没有崩溃的情况下中途失败
  • The word Crash means it crashed Crash这个词意味着它崩溃了

- -

Python 2.7
drive  walk                     5
drive  walk     magic word      8      * GOOD *
drive  listdir                  Crash
drive  listdir  magic word      8      * GOOD *
drive  listdir              cd  Crash
drive  listdir  magic word  cd  5
drive  glob                     5
drive  glob     magic word      0
UNC    walk                     6
UNC    walk     magic word      0
UNC    listdir                  5
UNC    listdir  magic word      Crash
UNC    listdir              cd  5
UNC    listdir  magic word  cd  Crash
UNC    glob                     5
UNC    glob     magic word      0

Python 3.3
drive  walk                     5
drive  walk     magic word      8      * GOOD *
drive  listdir                  Crash
drive  listdir  magic word      8      * GOOD *
drive  listdir              cd  Crash
drive  listdir  magic word  cd  5
drive  glob                     5
drive  glob     magic word      8      * GOOD *
UNC    walk                     6
UNC    walk     magic word      0
UNC    listdir                  5
UNC    listdir  magic word      Crash
UNC    listdir              cd  5
UNC    listdir  magic word  cd  Crash
UNC    glob                     5
UNC    glob     magic word      0

Use the 8.3 fallback to avoid the long pathname, browsing in Win7 explorer this seems to be what windows itself does, ie every long paths has a shorter 'true name': 使用8.3回退来避免长路径名,在Win7资源管理器中浏览这似乎是Windows本身所做的,即每个长路径都有一个较短的'真实名称':

>>> long_unc="\\\\K53\\Users\\Tolan\\testing\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxxxxxxxxxxxxxxxxxxxxxdddddddddddddddddddddwgggggggggggggggggggggggggggggggggggxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\\esssssssssssssssssssssggggggggggggggggggggggggggggggggggggggggggggggeee"
>>> os.listdir(long_unc)
FileNotFoundError: [WinError 3]

but you can use win32api (pywin32) to 'build' up a shorter version, ie 但是你可以使用win32api(pywin32)来“构建”一个更短的版本,即

short_unc=win32api.GetShortPathName(win32api.GetShortPathName(win32api.GetShortPathName("\\\\K53\\Users\\Tolan\\testing\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")+"\\xxxxxxxxxxxxxxxxxxxxxxxxdddddddddddddddddddddwgggggggggggggggggggggggggggggggggggxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx") + "\\esssssssssssssssssssssggggggggggggggggggggggggggggggggggggggggggggggeee")
>>> print(short_unc)
\\K53\Users\Tolan\testing\XXXXXX~1\XXXXXX~1\ESSSSS~1
>>> import os
>>> os.listdir(short_unc)
['test.txt']

clearly you can just fold the win32api.GetShortPathName call into you dir exploration rather than nesting as in my example. 很明显,您可以将win32api.GetShortPathName调用折叠到您的dir探索中,而不是像我的示例中那样嵌套。 I've done it like this with 3 calls because if you've already got a 'too long' path then win32api.GetShortPathName wont cope with it either, but you can do it per dir and stay below the limit. 我已经完成了这样的3次调用,因为如果你已经有一个“太长”的路径,那么win32api.GetShortPathName也不会应付它,但你可以每个dir并且保持低于限制。

To locate files on UNC paths, the the magic prefix is \\\\?\\UNC\\ rather than just \\\\?\\ . 要在UNC路径上查找文件,魔术前缀是\\\\?\\UNC\\而不仅仅是\\\\?\\

Reference: https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath 参考: https//msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath

So to access //server/share/really/deep/path/etc/etc , you'd need to 所以要访问//server/share/really/deep/path/etc/etc ,你需要

  1. Convert it to unicode (use the unicode() constructor) 将其转换为unicode(使用unicode()构造函数)
  2. Add the magic prefix ( "\\\\?\\\\UNC\\" ), and 添加魔术前缀( "\\\\?\\\\UNC\\" ),和
  3. Ensure all directory separators are "\\" (see os.path.normpath() ) 确保所有目录分隔符都是"\\" (请参阅os.path.normpath()

Resulting unicode string: \\\\?\\UNC\\server\\share\\really\\deep\\path\\etc\\etc 产生的unicode字符串: \\\\?\\UNC\\server\\share\\really\\deep\\path\\etc\\etc

I've only experimented a little (much less than @stenci did) but with Python 2.7 it seems to work OK with os.walk() , and to fail with os.listdir() . 我只尝试了一点(比@stenci少得多)但是使用Python 2.7它似乎与os.walk()一起工作,并且与os.listdir()失败。

Caveat: It only works with os.walk() if the starting path for the traversal is within the MAX_PATH limit, and none of the sub directories in the starting path would push it over the limit either. 警告:如果遍历的起始路径在MAX_PATH限制范围内,它只适用于os.walk(),并且起始路径中的所有子目录都不会超过限制。 This is because as os.walk() uses os.listdir() on the top directory. 这是因为os.walk()在顶层目录中使用os.listdir()。

In my previous comment I said that the nested recursive call of GetShortPathName is not required. 在我之前的评论中,我说过不需要嵌套的GetShortPathName递归调用。 I found it is not required most of the times, but once in a while it crashes. 我发现大部分时间都不需要它,但偶尔它会崩溃。 I wasn't able to figure out when, so I made this little function that has been working smoothly for some time: 我无法弄清楚什么时候,所以我做了这个功能已经运行了一段时间:

This is the function that I use now: 这是我现在使用的功能:

def short_name(name):
    try:
        return win32api.GetShortPathName(name)
    except win32api.error:
        dirname = os.path.dirname(name)
        basename = os.path.basename(name)
        short_dirname = win32api.GetShortPathName(dirname)
        return win32api.GetShortPathName(os.path.join(short_dirname, basename))

try:
    mtime = os.path.getmtime(name)
except FileNotFoundError:
    name = short_name(name)
    mtime = os.path.getmtime(name)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM