简体   繁体   English

如何在多个部分中提取和拆分字符串

[英]How to extract and split a string in multiple parts

I would like to search and split different files in a directory based on some pattern which may contain different file formats: 我想基于某些可能包含不同文件格式的模式在目录中搜索和拆分不同的文件:

/path/


somefile.txt 2010-01-01
file.txt 2010-01-02
f.txt 2010-01-03
test.txt 2010-01-04
photo.jpg 2010-01-04
script.py  2010-01-05

In order to get: 为了得到:

somefile.txt 
file.txt 
f.txt 
test.txt 

Here first I would like to catch all files which contains .txt and split them accordingly: 首先,我想捕获包含.txt的所有文件并相应地拆分它们:

def catch_txt(path):
    result = [os.path.join(path, f) for f in os.listdir(path) if 
re.search(r"\w+\.\w+\txt", f)]
    splitted_result = [files for files in result if 
re.split(r"\w+\.\w+\txt", f)]
    # some other stuff
    return splitted_result

But only gives a emtpy list. 但只给出一个emtpy列表。

您可以使用列表推导来获取.txt:

res = [ i.split(" ")[0] for i in os.listdir(path) if '.txt' in i ]

Your pattern: 你的模式:

r"\w+\.\w+\txt"

looks for: 寻找:

  1. A word character, one or more times, followed by... 一个字符,一次或多次,然后是......
  2. A literal dot, followed by... 一个字面点,然后是......
  3. A word character one or more times, followed by... 单词字符一次或多次,然后是......
  4. A tab character, followed by... 制表符,后跟...
  5. The literal characters 'xt'. 文字字符'xt'。

So your pattern will match filenames like: 所以你的模式将匹配文件名,如:

 hello.a    xt

If you want to match filenames like: 如果你想匹配文件名,如:

hello.txt

then you need to use a pattern like: 然后你需要使用如下模式:

r"\w+\.txt"

Here's a solution without using re . 这是一个不使用re的解决方案。 Assuming your list of filetypes is short, you can just create a list for each filetype 假设您的文件类型列表很短,您只需为每种文件类型创建一个列表

import os 

files = [f for f in os.listdir('.') if os.path.isfile(f)]
txt_files = [] #create additional lists/loops for each filetype
for file in files:
        if file.endswith('.txt'):
            txt_files.append(file)
print (txt_files)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM