简体   繁体   English

字符串替换并保存到新文件(Python v2.7)

[英]String Replacement and Saving to a New File (Python v2.7)

I am trying to replace all lines of a certain format with a blanks in a file ie replace a line of number/number/number (like a date) and number:number (like a time) with "". 我试图用文件中的空格替换某种格式的所有行,即用“”替换数字/数字/数字(如日期)和数字:数字(如时间)的行。 I want to read from the old file and then save the scrubbed version as a new file. 我想读取旧文件,然后将清理后的版本另存为新文件。

This is the code I have so far (I know it is way off): 这是我到目前为止的代码(我知道这已经过去了):

old_file = open("old_text.txt", "r")
new_file = open("new_text.txt", "w")

print (old_file.read())

for line in old_file.readlines():
    cleaned_line = line.replace("%/%/%", "")
    cleaned_line = line.replace("%:%", "")
    new_file.write(cleaned_line)

old_file.close
new_file.close

Thank you for your help, Ben 谢谢你的帮助,本

I am trying to replace all lines of a certain format with a blanks in a file ie replace a line of number/number/number (like a date) and number:number (like a time) with "". 我试图用文件中的空格替换某种格式的所有行,即用“”替换数字/数字/数字(如日期)和数字:数字(如时间)的行。

You can't use str.replace to match a pattern or format, only a literal string. 您不能使用str.replace来匹配模式或格式,而只能使用文字字符串。

To match a pattern, you need some kind of parser. 要匹配模式,您需要某种解析器。 For patterns like this, the regular expression engine built into the standard library as re is more than powerful enough… but you will need to learn how to write regular expressions for your patterns. 对于这样的模式,标准库as re内置的正则表达式引擎已经足够强大……但是您将需要学习如何为模式编写正则表达式。 The reference docs and Regular Expression HOWTO are great if you already know the basics; 如果您已经了解基础知识,那么参考文档和Regular Expression HOWTO就是不错的选择。 if not, you should search for a tutorial elsewhere. 如果没有,您应该在其他地方搜索教程。

Anyway, here's how you'd do this (fixing a few other things along the way, most of them explained by Lego Stormtroopr): 无论如何,这是您的操作方法(解决了其他问题,其中大部分由Lego Stormtroopr解释):

import re

with open("old_text.txt") as old_file, open("new_text.txt", "w") as new_file:
    for line in old_file:
        cleaned_line = re.sub(r'\d+/\d+/\d+', '', line)
        cleaned_line = re.sub(r'\d+:\d+', '', cleaned_line)
        new_file.write(cleaned_line)

Also, note that I used cleaned_line in the second sub ; 另外,请注意,我在第二sub cleaned_line使用了cleaned_line just using line again, as in your original code, means we lose the results of the first substitution. 就像您的原始代码一样,仅再次使用line ,就意味着我们失去了第一次替换的结果。

Without knowing the exact definition of your problem, I can't promise that this does exactly what you want. 在不知道您的问题的确切定义的情况下,我不能保证这完全符合您的要求。 Do you want to blank all lines that contain the pattern number/number/number, blank out all lines that are nothing but that pattern, blank out just that pattern and leave the rest of the line alone? 你想空白包含图案编号/数字/数字都行,空出那是什么,但该模式的所有行,空出只是模式和独自离开该行的其他人呢? All of those things are doable, and pretty easy, with re , but they're all done a little differently. 使用re ,所有这些事情都是可行的,并且非常容易,但是它们的操作却有所不同。


If you want to get a little trickier, you can use a single re.sub expression to replace all of the matching lines with blank lines at once, instead of iterating them one at a time. 如果您想花一点点技巧,可以使用一个re.sub表达式将所有匹配的行立即替换为空行,而不是一次迭代一次。 That means a slightly more complicated regexp vs. slightly simpler Python code, and it means probably better performance for mid-sized files but worse performance (and an upper limit) for huge files, and so on. 这意味着稍微复杂的regexp与稍微简单的Python代码,这意味着中型文件可能具有更好的性能,而大型文件则可能具有较差的性能(上限),依此类推。 If you can't figure out how to write the appropriate expression yourself, and there's no performance bottleneck to fix, I'd stick with explicit looping. 如果您不知道如何自己编写适当的表达式,并且没有性能瓶颈可以解决,我将坚持使用显式循环。

Firstly, there are some indentation issues, where the for loop was indented for no reason. 首先,存在一些缩进问题,其中for循环无缘无故缩进。 Secondly as soon as you read the file you have seeked to the end, so there are no more lines to read. 其次,一旦您read文件,您将一直搜索到末尾,因此不再需要读取任何行。 Lastly, the with command allows you to open a file and declare its variable name, and allow it to close due to error or reading to the end without having to worry about closing it manually. 最后, with命令允许您打开文件并声明其变量名,并允许由于错误或读到结尾而关闭文件,而不必担心手动关闭文件。

To perform the actual logic, however, you probably want to use a regular expression . 但是,要执行实际的逻辑,您可能要使用正则表达式 You can use re.search() to find the pattern 您可以使用re.search()查找模式

  • \\d+:\\d+ for any number of Digits , a colon and any number of Digits \\d+:\\d+用于任意数量的Digits,冒号和任意数量的Digits
  • \\d+\\/\\d+\\/d+ for three lots of any number of digits, with a literal / between them. \\d+\\/\\d+\\/d+为三批任意数量的数字的,与文字/它们之间。

The code you want is closer to this: 您想要的代码更接近于此:

import re
with open("old_text.txt", "r") as oldfile, open("new_text.txt", "w") as new_file:
    for line in old_file:
        # This will match if this pattern is anywhere in the line
        if re.search("\d+:\d+", line) is not None:
            line = ""
        # This will match if this pattern is anywhere in the line
        if re.search("\d+\/\d+\/d+", line) is not None:
            line = ""
        new_file.write(line)

If you only want to match at the beginning of the line, re.match() will probably be a better choice. 如果只想在行的开头进行匹配,则re.match()可能是一个更好的选择。

Here we declare a block with our two files, loop through the old_file , clean each line and write to the new_file . 在这里,我们用两个文件声明一个块,遍历old_file ,清理每一行并写入new_file Once the end of the old_file is reached all the files are cleanly closed. 到达old_file的末尾后,所有文件都会干净关闭。 If either file is not found, or an error occurs, the with block catches these and releases everything nicely. 如果找不到任何文件或发生错误,则with块将捕获这些文件并很好地释放所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将.txt文件处理成字典(Python v2.7) - Process .txt file into dictionary (Python v2.7) Python(v2.7)格式和打印 - Python (v2.7) format and print Python map v2.7在v3.2中没有长时间工作 - Python map v2.7 no long working in v3.2 mysql for python 2. 7说没有找到Python v2.7 - mysql for python 2. 7 says Python v2.7 not found python:从v2.7反向移植到v2.4时出现线程问题 - python: Threading issues while backporting to v2.4 from v2.7 Python v2.7请求v2.5.1-所有获取请求均返回HTTP错误503 - Python v2.7 Requests v2.5.1 - all get requests return HTTP Error 503 Python Wave模块仅在v2.7中工作,而在v3.4 Linux中不工作 - Python wave module only working in v2.7 not in v3.4 linux 如何在两个不同的对象中生成线程并在python v2.7中进行协调? - How do I Spawn threads from two different objects and coordinate them in python v2.7? 由于不一致的错误消息,Doctest在Python v2.7中成功,但在Python 2.6中未成功 - Doctest succeeds in Python v2.7 but not with Python 2.6 due to inconsistent error message 如何在Python v2.7中批量读取然后将Weblink .JSON文件列表写入C驱动器上的指定位置 - How to batch read and then write a list of weblink .JSON files to specified locations on C drive in Python v2.7
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM