简体   繁体   English

Python string.replace() 不替换字符

[英]Python string.replace() not replacing characters

Some background information: We have an ancient web-based document database system where I work, almost entirely consisting of MS Office documents with the "normal" extensions (.doc, .xls, .ppt).一些背景信息:我工作的地方有一个古老的基于 Web 的文档数据库系统,几乎完全由带有“普通”扩展名(.doc、.xls、.ppt)的 MS Office 文档组成。 They are all named based on some sort of arbitrary ID number (ie 1245.doc).它们都是根据某种任意 ID 号(即 1245.doc)命名的。 We're switching to SharePoint and I need to rename all of these files and sort them into folders.我们正在切换到 SharePoint,我需要重命名所有这些文件并将它们分类到文件夹中。 I have a CSV file with all sorts of information (like which ID number corresponds to which document's title), so I'm using it to rename these files.我有一个包含各种信息的 CSV 文件(比如哪个 ID 号对应哪个文档的标题),所以我用它来重命名这些文件。 I've written a short Python script that renames the ID number title.我编写了一个简短的 Python 脚本来重命名 ID 号标题。

However, some of the titles of the documents have slashes and other possibly bad characters to have in a title of a file, so I want to replace them with underscores:但是,文档的某些标题在文件标题中包含斜杠和其他可能的坏字符,因此我想用下划线替换它们:

bad_characters = ["/", "\\", ":", "(", ")", "<", ">", "|", "?", "*"]
for letter in bad_characters:
    filename = line[2].replace(letter, "_")
    foldername = line[5].replace(letter, "_")
  • Example of line[2] : "Blah blah boring - meeting 2/19/2008.doc" line[2]示例:“废话无聊 - 会议 2/19/2008.doc”
  • Example of line[5] : "Business meetings 2/2008" line[5]示例:“商务会议 2/2008”

When I add print letter inside of the for loop, it will print out the letter it's supposed to be replacing, but won't actually replace that character with an underscore like I want it to.当我在for循环内添加print letter ,它会打印出它应该替换的字母,但实际上不会像我想要的那样用下划线替换该字符。

Is there anything I'm doing wrong here?我在这里做错了什么吗?

That's because filename and foldername get thrown away with each iteration of the loop.这是因为filenamefilename foldername filename在循环的每次迭代中都会被丢弃。 The .replace() method returns a string, but you're not saving the result anywhere. .replace()方法返回一个字符串,但您没有将结果保存在任何地方。

You should use:你应该使用:

filename = line[2]
foldername = line[5]

for letter in bad_characters:
    filename = filename.replace(letter, "_")
    foldername = foldername.replace(letter, "_")

But I would do it using regex.但我会使用正则表达式来做到这一点。 It's cleaner and (likely) faster:它更干净并且(可能)更快:

p = re.compile('[/:()<>|?*]|(\\\)')
filename = p.sub('_', line[2])
folder = p.sub('_', line[5])

You are reassigning to the filename and foldername variables at every iteration of the loop.您将在循环的每次迭代中重新分配给filenamefoldername变量。 In effect, only * is being replaced.实际上,只有*被替换。

You should look at the python string method translate() http://docs.python.org/library/string.html#string.translate with http://docs.python.org/library/string.html#string.maketrans你应该看看Python字符串的方法translate() http://docs.python.org/library/string.html#string.translatehttp://docs.python.org/library/string.html#string.maketrans

Editing this to add an example as per comment suggestion below: 编辑此内容以根据以下评论建议添加示例:
 import string toreplace=''.join(["/", "\\\\", ":", "(", ")", "<", ">", "|", "?", "*"]) underscore=''.join( ['_'] * len(toreplace)) transtable = string.maketrans(toreplace,underscore) filename = filename.translate(transtable) foldername = foldername.translate(transtable)

Can simplify by making the toreplace something like '/\\:,' etc, i just used what was given above可以通过使 toreplace 类似 '/\\:,' 等来简化,我只是使用了上面给出的内容

You are starting over with the base line instead of saving the replaced result, thus you are getting the equivalent to您从基线重新开始,而不是保存替换的结果,因此您将获得等效于

filename = line[2].replace('*', '_')
foldername = line[5].replace('*', '_')

Try the following尝试以下

bad_characters = ["/", "\\", ":", "(", ")", "<", ">", "|", "?", "*"]
filename = line[2]
foldername = line[5]
for letter in bad_characters:
    filename = filename.replace(letter, "_")
    foldername = foldername.replace(letter, "_")

Should use string.replace(str, fromStr, toStr)应该使用 string.replace(str, fromStr, toStr)

bad_characters = ["/", "\\", ":", "(", ")", "<", ">", "|", "?", "*"]
for letter in bad_characters:
    filename = string.replace(line[2], letter, "_")
    foldername = string.replace(line[5], letter, "_")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM