RegEx排除目录，捕获用逗号分隔的文件名，排除“（number）”和扩展名

Question

在过去的三天里，我一直在尝试制作自己使用的图像/短视频标记系统，但是事实证明，这是我面临的挑战。

这些是字符串：

d:\images\tagging 1\GIFs\kung fu panda, fight.webm
d:\images\tagging 1\GIFs\kung fu panda, fight (2).webm
d:\images\tagging 1\GIFs\kung fu panda 2, fight.webm
d:\images\tagging 1\GIFs\kung fu panda 2, fight (2).webm
d:\images\tagging 1\GIFs\pulp fiction, samuel l. jackson, angry, funny.webm

我尝试修改四件事以实现我想要的目标但没有成功：

(?<=d:\\images\\tagging\s1\\GIFs\\)([\w\s])+

([a-z0-9]\s?)+

(?<=\\)[^\\]*?(?=\..*$)

[^\\/:*?"<>|\r\n]+$

1几乎在那里，但是没有超出第一个逗号。

2这几乎可以完成所有操作，但是我还没有找到排除目录，（＃）和扩展名的方法。

3从互联网上获取“ l”。 并在此处停止，整个文件名，无法按我的要求使用逗号，捕获（＃）。

4取自regexbuddy（是的，我实际上是在绝望中购买了它），捕获（＃）和扩展名。

@timgeb

目的是获取不带逗号，（＃）和扩展名的文件名，因此：

"kung fu panda" "fight"
"kung fu panda" "fight"
"kung fu panda 2" "fight"
"kung fu panda 2" "fight"
"pulp fiction" "samuel l. jackson" "angry" "funny"

Answer 1

您的问题不是很清楚，但我认为您想解析文件名。 如果是这样，我不建议您将re用作主要工具。

相反，请查看os.path ：

import os.path  # Or `import ntpath` for Windows paths on non-Windows systems

dir, file_name = os.path.split('d:\images\tagging 1\GIFs\kung fu panda, fight (2).webm')
# dir = 'd:\images\tagging 1\GIFs'
# file_name = 'kung fu panda, fight (2).webm'

root, ext = os.path.splitext(file_name)
# root = 'kung fu panda, fight (2)'
# ext = '.webm'

现在，您有一个更简单的问题：删除括号中的数字。

Answer 2

获取基本名称，用空字符串替换括号内的整数和扩展名，并去除空格。

from ntpath import basename
import re
map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))

演示：

>>> s = 'd:\images\tagging 1\GIFs\kung fu panda, fight.webm'
>>> map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))
['kung fu panda', 'fight']
>>> s = 'd:\images\tagging 1\GIFs\kung fu panda, fight (2).webm'
>>> map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))
['kung fu panda', 'fight']
>>> s = 'd:\images\tagging 1\GIFs\kung fu panda 2, fight.webm'
>>> map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))
['kung fu panda 2', 'fight']
>>> s = 'd:\images\tagging 1\GIFs\kung fu panda 2, fight (2).webm'
>>> map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))
['kung fu panda 2', 'fight']
>>> s = 'd:\images\tagging 1\GIFs\pulp fiction, samuel l. jackson, angry, funny.webm'
>>> map(str.strip, re.sub('\(\d+\)|\.\w+$', '', basename(s)).split(','))
['pulp fiction', 'samuel l. jackson', 'angry', 'funny']

Answer 3

如果我得到了您，您想要的最后一个标签（功夫熊猫，fight.webm）在1\\GIFs\\ -如果您添加更多内容字符串，那么我可以为您标准化代码。 此代码仅提取标签并生成常规列表。 汇入

s="""d:\images\tagging 1\GIFs\kung fu panda, fight.webm
d:\images\tagging 1\GIFs\kung fu panda, fight (2).webm
d:\images\tagging 1\GIFs\kung fu panda 2, fight.webm
d:\images\tagging 1\GIFs\kung fu panda 2, fight (2).webm
d:\images\tagging 1\GIFs\pulp fiction, samuel l. jackson, angry, funny.webm"""

lines = s.split('\n')# Just generate a list of lines
for t in lines:
    data = re.search(r'1\\GIFs\\(.+$)',t)
    print data.group(1).split(',')

输出 -

['kung fu panda', ' fight.webm']
['kung fu panda', ' fight (2).webm']
['kung fu panda 2', ' fight.webm']
['kung fu panda 2', ' fight (2).webm']
['pulp fiction', ' samuel l. jackson', ' angry', ' funny.webm']

表达式1\\\\GIFs\\\\(.+$)将捕获1\\\\GIFs之后的最后一个标签

RegEx排除目录，捕获用逗号分隔的文件名，排除“（number）”和扩展名

问题描述

3 个解决方案

解决方案1
3 2016-01-24 16:39:43

解决方案2
1 2016-01-24 20:35:47

解决方案3
0 2016-01-24 16:38:05

查看`现场演示`

RegEx排除目录，捕获用逗号分隔的文件名，排除“（number）”和扩展名

问题描述

3 个解决方案

解决方案1 3 2016-01-24 16:39:43

解决方案2 1 2016-01-24 20:35:47

解决方案3 0 2016-01-24 16:38:05

查看现场演示

解决方案1
3 2016-01-24 16:39:43

解决方案2
1 2016-01-24 20:35:47

解决方案3
0 2016-01-24 16:38:05

查看`现场演示`