简体   繁体   English

打印文件名称

[英]Printing name off of file

f = glob.glob('/fulldirectory/*.txt')

for index, files in enumerate(f, 1):
    r = open(files)
    reader = csv.DictReader(r)

So I am trying to print off the actual name of a file as part of my analysis. 所以我试图打印文件的实际名称作为我的分析的一部分。

Each file in the directory above is named with this convention: R1.txt, R2.txt, R3.txt, etc. 上面目录中的每个文件都使用以下约定命名:R1.txt,R2.txt,R3.txt等。

At the moment I am simply using the enumerate function to print off the number - but this only works under the assumption that no files are missing in the directory. 目前我只是使用枚举函数来打印数字 - 但这只能在假设目录中没有文件丢失的情况下工作。

EDIT: 编辑:

I tried this, but it's not giving me quite what I want: 我尝试过这个,但它没有给我我想要的东西:

p = [int(s) for s in files if files.isdigit()]

print p

>[0,1]
>[0,2]

You can just do a simple re.sub to substitute the .txt to an empty string. 您可以只使用简单的re.sub.txt替换为空字符串。

import re, glob
f = glob.glob('/fulldirectory/*.txt')
for file in f:
    print(re.sub('\.txt$', '', file))
    r = open(file)
    reader = csv.DictReader(r)

In an ideal world I would print 'index', and on the first iteration, R01 would be printed. 在理想的世界中,我会打印'index',在第一次迭代中,将打印R01。 Then R02, etc. 然后是R02等

If you like them to always be in order, do this instead to first sort the file names: 如果您希望它们始终处于有序状态,请执行此操作以首先对文件名进行排序:

f = sorted(glob.glob('/fulldirectory/*.txt'))

If you only want to print the base name of the file, you can print this instead: 如果您只想打印文件的基本名称,则可以打印出来:

import os
print(re.sub('\.txt$', '', os.path.basename(file)))

Note: the other way suggested may not be very safe, because it's not recommended to use multiple splits on file names. 注意:建议的其他方式可能不太安全,因为不建议在文件名上使用多个拆分。

Here is a complete example with decent explanation that the OP requested: 这是OP提出的一个完整的例子,其中有一个很好的解释:

import re, os, glob
file_list = glob.glob('/fulldirectory/*.txt') # get the list of file names that ends in .txt
f = sorted(file_list, key = lambda x: int(re.findall('\d+\.txt$',os.path.basename(x))[0]))
    # 1               2     3          8  4          5            6                   7
for file in f:
    print(re.sub('\.txt$', '', file))
          # 9
    # do your stuff....
  1. The sorted() function is used to sort the list of file names then store it into f (f is the sorted version of file_list) sorted()函数用于对文件名列表进行排序,然后将其存储到f(f是file_list的排序版本)

  2. the key argument is a function that accepts an argument and outputs a sortable object(ie. str , int , list ...), it is used to define the key it's sorting with key参数是一个函数,它接受一个参数并输出一个可排序的对象(即strintlist ...),它用于定义它用于排序的键。

  3. lambda is an anonymous function that accepts argument 'x', this works the same way as a def NoName(x): return something lambda是一个接受参数'x'的匿名函数,它的工作方式与def NoName(x): return something

  4. Use re.findall to find all substrings that matches the regex, in this case, there only should one one match [ie. 使用re.findall查找与正则表达式匹配的所有子字符串,在这种情况下,只有一个匹配[即。 'abc123.txt' will return [123] ] 'abc123.txt'将返回[123] ]

  5. '\\d+\\.txt$' is a regex, \\d+ - any number repeating one or more times, \\. '\\d+\\.txt$'是一个正则表达式, \\d+ - 任何重复一次或多次的数字, \\. is a regular dot . 是一个常规点. theres a \\ in front because normally in regex, a . 因为通常在正则表达式中,所以前面有一个\\ . has the special meaning which it represents any character, the \\ escapes it, making it only a regular . 具有它代表任何角色的特殊含义, \\它逃脱它,使它只是一个常规. , txt is a string to match at that given location, and $ is the symbol indicating it to match only at the end of the string. txt是在给定位置匹配的字符串, $是表示仅匹配字符串末尾的符号。

  6. os.path.basename() is used to retrieve the basename (the final part of the path [ie. 'abc123.txt' of '\\a\\b\\c\\abc123.txt' ]) os.path.basename()用于检索基本名称(路径的最后部分[即'abc123.txt''\\a\\b\\c\\abc123.txt' ])

  7. since re.findall() always return a list so to retrieve the only match will involve using [0] (ie. ['123',][0] => '123' ) 因为re.findall()总是返回一个列表所以要检索唯一的匹配将涉及使用[0](即。 ['123',][0] => '123'

  8. because the data retrieved is a string, have to use the int() to change it to an int for comparing. 因为检索的数据是一个字符串,必须使用int()将其更改为int进行比较。 The int is what got passed to key in #2. int是传递给#2中key

  9. re.sub('\\.txt$', '', file) the first argument is a regex, the second argument is the string to replace it to, the third argument is the string that need replacing (ie. re.sub('a', '', 'banana') => bnn because it replaces every a with nothing) check #5 for more information about the regex re.sub('\\.txt$', '', file)第一个参数是正则表达式,第二个参数是要替换它的字符串,第三个参数是需要替换的字符串(即re.sub('a', '', 'banana') => bnn因为它取代了每a没有任何东西)检查#5有关正则表达式的更多信息

If you need any more clarification, tell me 如果您需要进一步澄清,请告诉我

Alternative method of importing is this: 替代的导入方法是:

import glob
from re import sub, findall
from os.path import basename
file_list = glob.glob('/fulldirectory/*.txt') # get the list of file names that ends in .txt
f = sorted(file_list, key = lambda x: int(findall('\d+\.txt$',basename(x))[0]))

for file in f:
    print(sub('\.txt$', '', file))

    # do your stuff....

helpful links: 有用的网址:

https://docs.python.org/3/library/re.html https://docs.python.org/3/library/re.html

https://docs.python.org/3/library/os.path.html https://docs.python.org/3/library/os.path.html

https://docs.python.org/3/tutorial/ https://docs.python.org/3/tutorial/

Just go like this: 就这样:

f = glob.glob('/fulldirectory/*.txt')

for files in f:
    print files.split('\\')[-1].split('.')[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM