[英]Python Regex to extract file where filename contains and also should not contain specific pattern from a zip folder
I want to extract just one specific single file from the zip folder which has the below 3 files.我只想从包含以下 3 个文件的 zip 文件夹中提取一个特定的单个文件。
Basically it should start with 'kpidata_nfile' and should not contain 'fileheader'基本上它应该以“kpidata_nfile”开头,不应该包含“fileheader”
kpidata_nfile_20220919-20220925_fileheader.csv kpidata_nfile_20220919-20220925_fileheader.csv
kpidata_nfile_20220905-20220911.csv kpidata_nfile_20220905-20220911.csv
othername_kpidata_nfile_20220905-20220911.csv othername_kpidata_nfile_20220905-20220911.csv
Below is my code i have tried-以下是我尝试过的代码-
from zipfile import ZipFile
import re
import os
for x in os.listdir('.'):
if re.match('.*\.(zip)', x):
with ZipFile(x, 'r') as zip:
for info in zip.infolist():
if re.match(r'^kpidata_nfile_', info.filename):
zip.extract(info)
Output required - kpidata_nfile_20220905-20220911.csv需要 Output - kpidata_nfile_20220905-20220911.csv
This regex does what you require:此正则表达式可满足您的要求:
^kpidata_nfile(?:(?!fileheader).)*$
See this answer for more about the (?:(?.fileheader).)*$
part.有关
(?:(?.fileheader).)*$
部分的更多信息,请参阅此答案。
You can see the regex working on your example filenames here .您可以在此处查看处理示例文件名的正则表达式。
The regex is not particularly readable, so it might be better to use Python expressions instead of regex.正则表达式的可读性不是特别好,因此最好使用 Python 表达式而不是正则表达式。 Something like:
就像是:
fname = info.filename
if fname.startswith('kpidata_nfile') and 'fileheader' not in fname:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.