简体   繁体   中英

Getting file names without file extensions with glob

I'm searching for .txt files only

from glob import glob
result = glob('*.txt')

>> result
['text1.txt','text2.txt','text3.txt']

but I'd like result without the file extensions

>> result
['text1','text2','text3']

Is there a regex pattern that I can use with glob to exclude the file extensions from the output, or do I have to use a list comprehension on result ?

There is no way to do that with glob() , You need to take the list given and then create a new one to store the values without the extension:

import os
from glob import glob

[os.path.splitext(val)[0] for val in glob('*.txt')]

os.path.splitext(val) splits the file names into file names and extensions. The [0] just returns the filenames.

Since you're trying to split off a filename extension, not split an arbitrary string, it makes more sense to use os.path.splitext (or the pathlib module). While it's true that the it makes no practical difference on the only platforms that currently matter (Windows and *nix), it's still conceptually clearer what you're doing. (And if you later start using path-like objects instead of strings, it will continue to work unchanged, to boot.)

So:

paths = [os.path.splitext(path)[0] for path in paths]

Meanwhile, if this really offends you for some reason, what glob does under the covers is just calling fnmatch to turn your glob expression into a regular expression and then applying that to all of the filenames. So, you can replace it by just replacing the regex yourself and using capture groups:

rtxt = re.compile(r'(.*?)\.txt')
files = (rtxt.match(file) for file in os.listdir(dirpath))
files = [match.group(1) for match in files if match]

This way, you're not doing a listcomp on top of the one that's already in glob ; you're doing one instead of the one that's already in glob . I'm not sure if that's a useful win or not, but since you seem to be interested in eliminating a listcomp…

使用索引切片:

result = [i[:-4] for i in result]

Another way using rsplit :

>>> result = ['text1.txt','text2.txt.txt','text3.txt']
>>> [x.rsplit('.txt', 1)[0] for x in result]
['text1', 'text2.txt', 'text3']

You could do as a list-comprehension:

result = [x.rsplit(".txt", 1)[0] for x in glob('*.txt')]

这个 glob 只选择没有扩展名的文件: **/*/!(*.*)

Use str.split

>>> result = [r.split('.')[0] for r in glob('*.txt')]
>>> result
['text1', 'text2', 'text3']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM