简体   繁体   English

Python,字符串切片(从文件位置列表中获取文件名)

[英]Python, string slicing (getting file names from a list of file locations)

I am trying to get the files names from a list of file locations. 我正在尝试从文件位置列表中获取文件名。 Thinking it involves string slicing. 认为它涉及字符串切片。

The one I worked out is: 我得出的结论是:

L = ['C:\\Design\dw\file4.doc',
'C:\\light\PDF\downloads\list.doc',
'C:\\Design\Dq\file4g.doc',
'C:\\Design\Dq\file4r.doc',
'C:\\Design\Dq\file4k.doc',
'C:\\Design\Dq\ole.doc',
'C:\\GE\easy\file\os_references(9).doc',
'C:\\mate\KLO\Market\BIZ\KP\who\Documents\REF.doc']

LL = []

for a in L:
    b = a.split('\')
    for c in b:
        if c.endswith('.doc'):
            c.replace('.doc', '')
            LL.append(c)

print LL

question 1: the output still contains '.doc'. 问题1:输出仍然包含“ .doc”。 why, and how can I have them removed? 为什么,如何删除它们?

question 2: what's the better way to get the file names? 问题2:获取文件名的更好方法是什么?

Thanks. 谢谢。

The answer to the first question is that strings are immutable, .replace() doesn't modify the string in place, viz: 第一个问题的答案是字符串是不可变的,.replace()不会修改字符串,即:

blaize@bolt ~ $ python 
>>> s = "foobar"
>>> s2 = s.replace("o", "x")
>>> print s
foobar
>>> print s2
fxxbar

My answer to the second question follows: 我对第二个问题的回答如下:

# I use ntpath because I'm running on Linux.
# This way is more robust if you know you'll be dealing with Windows paths.
# An alternative is to import from os.path then linux filenames will work 
# in Linux and Windows paths will work in Windows.
from ntpath import basename, splitext

# Use r"" strings as people rightly point out.
# "\n" does not do what you think it might.
# See here: https://docs.python.org/2.0/ref/strings.html.
docs = [r'C:\Design\dw\file4.doc',
        r'C:\light\PDF\downloads\list.doc',
        r'C:\Design\Dq\file4g.doc',
        r'C:\Design\Dq\file4r.doc',
        r'C:\Design\Dq\file4k.doc',
        r'C:\Design\Dq\ole.doc',
        r'C:\Design/Dq/test1.doc',  # test a corner case
        r'\\some_unc_machine\Design/Dq/test2.doc',  # test a corner case
        r'C:\GE\easy\file\os_references(9).doc',
        r'C:\mate\KLO\Market\BIZ\KP\who\Documents\REF.doc']

# Please use meaningful variable names:
basenames = []

for doc_path in docs:

    # Please don't reinvent the wheel.
    # Use the builtin path handling functions.
    # File naming has a lot of exceptions and weird cases 
    # (particularly on Windows).
    file_name = basename(doc_path)
    file_basename, extension = splitext(file_name)
    if extension == ".doc":
        basenames.append(file_basename)

print basenames

Best of luck mate. 祝你好运。 Python is an excellent language. Python是一种出色的语言。

[file.split('\\')[-1].split('.')[0] for file in L]

You're actually not doing any slicing in your example. 实际上,您没有在示例中进行任何切片。 You are splitting and replacing. 您正在拆分和替换。 Since we know the file name and extension will always be the last part of a path we can use a negative index to access it after splitting. 因为我们知道文件名和扩展名将始终是路径的最后一部分,所以我们可以在分割后使用负索引来访问它。

Once we split again on the period the file name will always be the 0th element so we can just grab that and add it to a list. 一旦我们在句点上再次分割,文件名将始终是第0个元素,因此我们只需抓住它并将其添加到列表中即可。

EDIT: I just noticed that this method will have problems with paths that contain \\f since this is a special Python character. 编辑:我只是注意到此方法将包含\\f路径有问题,因为这是一个特殊的Python字符。

try this if there is no space or other symbols in filename 如果文件名中没有空格或其他符号,请尝试此操作

[re.findall('\w+.doc$', L) for x in L]

Try to take a look at 尝试看看

ntpath module ntpath模块

First thing replace method returns the string with the replaced value. 首先,replace方法返回具有替换值的字符串。 It does not changes the string. 它不会更改字符串。 So you need to do 所以你需要做

c = c.replace('.doc', '')

First answer: replace returns a copy of string, so you doesn't save your changes. 第一个答案:replace返回字符串的副本,因此您不保存更改。
Second answer: You need to get the raw representation of several of the paths because combinations like '\\f' are interpretated as an utf-8 char. 第二个答案:因为诸如'\\f''\\f'组合被解释为utf-8字符,所以您需要获取几个路径的原始表示。
So the tricky part is format the strings to its raw representation. 因此,棘手的部分是将字符串格式化为其原始表示形式。 For this i've used the raw() of this answer 为此,我使用了这个答案raw()
Once we have this function, we can manipulate well the strings. 一旦有了此功能,我们就可以很好地操作字符串。
I've used re.split to accept unix and dos format paths 我用过re.split来接受unix和dos格式的路径

>>> L = [re.split(r'[\/\\]', raw(path)) for path in L]
>>> L
[['C:', 'Design', 'dw', 'file4.doc'], ['C:', 'light', 'PDF', 'downloads', 'list.doc'], ['C:', 'Design', 'Dq', 'file4g.doc'], ['C:', 'Design', 'Dq', 'file4r.doc'], ['C:', 'Design', 'Dq', 'file4k.doc'], ['C:', 'Design', 'Dq', 'ole.doc'], ['C:', 'GE', 'easy', 'file', 'os_references(9).doc'], ['C:', 'mate', 'KLO', 'Market', 'BIZ', 'KP', 'who', 'Documents', 'REF.doc']]

Now L contains a list of path parts, so you can access to file name and its extension getting the last element of every list 现在L包含路径部分的列表,因此您可以访问文件名及其扩展名,获取每个列表的最后一个元素

>>> L_names = [path_parts[-1] for path_parts in L if path_parts[-1].endswith('.doc')]
>>> L_names
['file4.doc', 'list.doc', 'file4g.doc', 'file4r.doc', 'file4k.doc', 'ole.doc', 'os_references(9).doc', 'REF.doc']

The first important point is that you should input your list with raw string ( r prefix): 首先要注意的是,您应该使用原始字符串( r前缀)输入列表:

L = [r'C:\\Design\dw\file4.doc',
     r'C:\\light\PDF\downloads\list.doc',
     …]

Otherwise, characters are interpolated, in your file names ( \\… is generally replaced by a single character). 否则,将在文件名中插入字符(通常将\\…替换为单个字符)。

Python 2 has a dedicated sub-module just for manipulating paths, which gives you the expected result: Python 2有一个专门用于处理路径的子模块,它为您提供了预期的结果:

from os.path import basename, splitext                                          
print [splitext(basename(path))[0] for path in L]

Note that the paths and this script must be run on systems that use the same path separator ( / or \\ ) convention (which should usually be the case, as paths generally make sense locally on a machine). 请注意,路径和此脚本必须在使用相同路径分隔符( /\\ )约定的系统上运行(通常应该是这种情况,因为路径通常在计算机上本地有意义)。 You can make it work specifically for Windows path (on any operating system) by doing instead: 您可以改为执行以下操作,使其专门用于Windows路径(在任何操作系统上):

from ntpath import basename, splitext 

You then get, on any machine: 然后,您可以在任何计算机上使用:

['file4', 'list', 'file4g', 'file4r', 'file4k', 'ole', 'os_references(9)', 'REF']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM