简体   繁体   English

Windows文件路径的Python正则表达式

[英]Python regular expression for Windows file path

The problem, and it may not be easily solved with a regex, is that I want to be able to extract a Windows file path from an arbitrary string. 这个问题(使用正则表达式可能无法轻松解决)是我希望能够从任意字符串中提取Windows文件路径。 The closest that I have been able to come (I've tried a bunch of others) is using the following regex: 我能够(我尝试了很多其他方法)最接近的方法是使用以下正则表达式:

[a-zA-Z]:\\([a-zA-Z0-9() ]*\\)*\w*.*\w*

Which picks up the start of the file and is designed to look at patterns (after the initial drive letter) of strings followed by a backslash and ending with a file name, optional dot, and optional extension. 它选择了文件的开头,旨在查看字符串的模式(在初始驱动器号之后),后跟反斜杠,并以文件名,可选点和可选扩展名结尾。

The difficulty is what happens, next. 困难是接下来发生的事情。 Since the maximum path length is 260 characters, I only need to count 260 characters beyond the start. 由于最大路径长度为260个字符,因此我只需要计算起始位置以外的260个字符。 But since spaces (and other characters) are allowed in file names I would need to make sure that there are no additional backslashes that could indicate that the prior characters are the name of a folder and that what follows isn't the file name, itself. 但是,由于文件名中允许使用空格(和其他字符),因此我需要确保没有其他反斜杠,这些反斜杠可以表明先前的字符是文件夹的名称,而后面的不是文件名本身。

I am pretty certain that there isn't a perfect solition (the perfect being the enemy of the good) but I wondered if anyone could suggest a "best possible" solution? 我可以肯定没有完美的隔离(完美是商品的敌人),但我想知道是否有人可以提出“最佳可能”解决方案?

Here's the expression I got, based on yours, that allow me to get the path on windows : [a-zA-Z]:\\\\((?:[a-zA-Z0-9() ]*\\\\)*).* . 这是基于您的表达式,使我能够在Windows上获取路径: [a-zA-Z]:\\\\((?:[a-zA-Z0-9() ]*\\\\)*).* An example of it being used is available here : https://regex101.com/r/SXUlVX/1 此处提供了使用示例: https : //regex101.com/r/SXUlVX/1

First, I changed the capture group from ([a-zA-Z0-9() ]*\\\\)* to ((?:[a-zA-Z0-9() ]*\\\\)*) . 首先,我将捕获组从([a-zA-Z0-9() ]*\\\\)*更改为((?:[a-zA-Z0-9() ]*\\\\)*)
Your original expression captures each XXX\\ one after another (eg : Users\\ the Users\\ ). 您的原始表达式一个接一个地捕获每个XXX\\ (例如: Users\\ Users\\ )。
Mine matches (?:[a-zA-Z0-9() ]*\\\\)* . 我的比赛(?:[a-zA-Z0-9() ]*\\\\)* This allows me to capture the concatenation of XXX\\YYYY\\ZZZ\\ before capturing. 这使我可以在捕获之前捕获XXX\\YYYY\\ZZZ\\的串联。 As such, it allows me to get the full path. 因此,它使我获得了完整的途径。

The second change I made is related to the filename : I'll just match any group of character that does not contain \\ (the capture group being greedy). 我所做的第二个更改与文件名有关:我将匹配不包含\\任何字符组(捕获组为贪婪的)。 This allows me to take care of strange file names. 这使我可以处理奇怪的文件名。

Another regex that would work would be : [a-zA-Z]:\\\\((?:.*?\\\\)*).* as shown in this example : https://regex101.com/r/SXUlVX/2 另一个有效的正则表达式为: [a-zA-Z]:\\\\((?:.*?\\\\)*).* ,如本例所示: https : //regex101.com/r/SXUlVX/ 2

This time, I used .*?\\\\ to match the XXX\\ parts of the path. 这次,我使用.*?\\\\来匹配路径的XXX\\部分。
.*? will match in a non-greedy way : thus, .*?\\\\ will match the bare minimum of text followed by a back-slash. 将以非贪婪的方式进行匹配:因此, .*?\\\\将匹配文本的最少部分,后跟一个反斜杠。

Do not hesitate if you have any question regarding the expressions. 如果您对表达式有任何疑问,请不要犹豫。
I'd also encourage you to try to see how well your expression works using : https://regex101.com . 我也鼓励您尝试使用https://regex101.com来查看表达式的效果。 This also has a list of the different tokens you can use in your regex. 这也列出了您可以在正则表达式中使用的不同令牌。

Edit : As my previous answer did not work (though I'll need to spend some times to find out exactly why), I looked for another way to do what you want. 编辑:由于我以前的答案不起作用(尽管我需要花一些时间来找出确切的原因),我正在寻找另一种方法来做你想要的。 And I managed to do so using string splitting and joining. 我设法使用字符串拆分和连接来做到这一点。
The command is "\\\\".join(TARGETSTRING.split("\\\\")[1:-1]) . 命令是"\\\\".join(TARGETSTRING.split("\\\\")[1:-1])
How does this work : Is plit the original string into a list of substrings, based. 它是如何工作的:将原始字符串分成多个子字符串列表。 I then remove the first and last part ( [1:-1] from 2nd element to the one before the last) and transform the resulting list back into a string. 然后,我删除第一部分和最后一部分( [1:-1]到最后一个之前的部分),然后将结果列表转换回字符串。

This works, whether the value given is a path or the full address of a file. 无论给定的值是路径还是文件的完整地址,此方法均有效。 Program Files (x86)\\\\Adobe\\\\Acrobat Distiller\\\\acrbd.exe fred is a file path Program Files (x86)\\\\Adobe\\\\Acrobat Distiller\\\\acrbd.exe fred\\ is a directory path Program Files (x86)\\\\Adobe\\\\Acrobat Distiller\\\\acrbd.exe fred是文件路径Program Files (x86)\\\\Adobe\\\\Acrobat Distiller\\\\acrbd.exe fred\\是目录路径

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM