简体   繁体   English

如何使用正则表达式查找具有特定扩展名的文件名?

[英]How to find filenames with a specific extension using regex?

How can I grab 'dlc3.csv' & 'spongebob.csv' from the below string via the absolute quickest method - which i assume is regex?如何通过绝对最快的方法从下面的字符串中获取 'dlc3.csv' 和 'spongebob.csv' - 我认为这是正则表达式?

4918,  fx,fx,weapon/muzzleflashes/fx_m1carbine,3.3,3.3,|sp/zombie_m1carbine|weapon|../zone_source/dlc3.csv|csv|../zone_source/spongebob.csv|csv

I've already managed to achieve this by using split() and for loops but its slowing my program down way too much.我已经设法通过使用 split() 和 for 循环来实现这一点,但这会使我的程序减慢太多。

I would post an example of my current code but its got a load of other stuff in it so it would only cause you to ask more questions.我会发布我当前代码的示例,但其中包含大量其他内容,因此只会让您提出更多问题。

In a nutshell im opening a large 6,000 line .csv file and im then using nested for loops to iterate through each line and using .split() to find specific parts in each line.简而言之,我打开一个 6,000 行的大 .csv 文件,然后我使用嵌套的 for 循环遍历每一行并使用 .split() 查找每一行中的特定部分。 I have many files where i need to scan specific things on each line and atm ive only implemented a couple features into my Qt program and its already taking upto 5 seconds to load some things and up to 10 seconds for others.我有很多文件,我需要扫描每一行上的特定内容,而 atmive 只在我的 Qt 程序中实现了几个功能,并且它已经花费了 5 秒来加载某些内容,而其他内容则需要 10 秒。 All of which is due to the nested loops.所有这些都是由于嵌套循环。 Ive looked at where to use range, where not to, and where to use enumerate.我查看了在哪里使用范围,在哪里不使用,以及在哪里使用枚举。 I also use time.time() and loggin.info() to show each code changes speed.我还使用 time.time() 和 loggin.info() 来显示每个代码的更改速度。 And after asking around ive been told that using a regex is the best option for me as it would remove the need for many of my for loops.在四处询问后,我被告知使用正则表达式对我来说是最好的选择,因为它可以消除我对许多 for 循环的需求。 Problem is i have no clue how to use regex.问题是我不知道如何使用正则表达式。 I of course plan on learning it but if someone could help me out with this it'll be much appreciated.我当然计划学习它,但如果有人可以帮助我解决这个问题,我将不胜感激。

Thanks.谢谢。

Edit: just to point out that when scanning each line the filename is unknown.编辑:只是指出扫描每一行时文件名是未知的。 ".csv" is the only thing that isnt unknown. “.csv”是唯一不为人知的东西。 So i basically need the regex to grab every filename before .csv but of course without grabbing the crap before the filename.所以我基本上需要正则表达式来抓取 .csv 之前的每个文件名,但当然不需要在文件名之前抓取废话。

Im currently looking for .csv using .split('/') & .split('|'), then checking if .csv is in list index to grab the 'unknown' filename.我目前正在使用 .split('/') 和 .split('|') 查找 .csv,然后检查 .csv 是否在列表索引中以获取“未知”文件名。 And some lines will only have 1 filename whereas others will have 2+ so i need the regex to account for this too.有些行只有 1 个文件名,而其他行会有 2+ 个文件名,所以我也需要正则表达式来解决这个问题。

You can use this pattern: [^/]*\.csv您可以使用此模式: [^/]*\.csv

Breakdown:分解:

  • [^/] - Any character that's not a forward slash (or newline) [^/] - 任何不是正斜杠(或换行符)的字符
    • * - Zero or more of them * - 零个或多个
  • \. - A literal dot. - 一个字面点。 (This is necessary because the dot is a special character in regex.) (这是必要的,因为点是正则表达式中的特殊字符。)

For example:例如:

import re

s = '''4918,  fx,fx,weapon/muzzleflashes/fx_m1carbine,3.3,3.3,|sp/zombie_m1carbine|weapon|../zone_source/dlc3.csv|csv|../zone_source/spongebob.csv|csv'''

pattern = re.compile(r'[^/]*\.csv')

result = pattern.findall(s)

Result:结果:

['dlc3.csv', 'spongebob.csv']

Note: It could just as easily be result = re.findall(r'[^/]*\.csv', s) , but for code cleanliness, I prefer naming my regexes.注意:它可以很容易地是result = re.findall(r'[^/]*\.csv', s) ,但是为了代码的简洁,我更喜欢命名我的正则表达式。 You might consider giving it an even clearer name in your code, like pattern_csv_basename or something like that.你可能会考虑在你的代码中给它一个更清晰的名字,比如pattern_csv_basename或类似的东西。

Docs : re , including re.findall文档re ,包括re.findall

See also : The official Python Regular Expression HOWTO另请参阅:官方 Python正则表达式 HOWTO

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM