[英]remove first char from each line in a text file
im new to Python, to programming in general. 我是Python新手,一般是编程。
I want to remove first char from each line in a text file and write the changes back to the file. 我想从文本文件中的每一行中删除第一个字符,并将更改写回文件。 For example i have file with 36 lines, and the first char in each line contains a symbol or a number, and i want it to be removed.
例如,我有36行的文件,每行中的第一个字符包含一个符号或数字,我希望它被删除。
I made a little code here, but it doesn't work as expected, it only duplicates whole liens. 我在这里制作了一些代码,但它没有按预期工作,它只重复整个留置权。 Any help would be appreciated in advance!
任何帮助将提前感谢!
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
Your code already does remove the first character. 您的代码已经做删除的第一个字符。 I saved exactly your code as both
dupy.py
and dupy.txt
, then ran python dupy.py dupy.txt
, and the result is: 我将你的代码保存为
dupy.py
和dupy.txt
,然后运行python dupy.py dupy.txt
,结果是:
from sys import argv
run, filename = argv
f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
f.write(line[1:])
f.close()
rom sys import argv
un, filename = argv
= open(filename, 'a+')
.seek(0)
ines = f.readlines()
or line in lines:
f.write(line[1:])
.close()
It's not copying entire lines; 它不是复制整行; it's copying lines with their first character stripped.
它是在第一个字符被剥离的情况下复制行。
But from the initial statement of your problem, it sounds like you want to overwrite the lines, not append new copies. 但是从你的问题的最初陈述,听起来你想要覆盖线条,而不是附加新的副本。 To do that, don't use
append
mode. 为此,请勿使用
append
模式。 Read the file, then write it: 读取文件,然后写下:
from sys import argv
run, filename = argv
f = open(filename)
lines = f.readlines()
f.close()
f = open(filename, 'w')
for line in lines:
f.write(line[1:])
f.close()
Or, alternatively, write a new file, then move it on top of the original when you're done: 或者,或者,写一个新文件,然后在完成后将其移到原始文件之上:
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = f.readlines()
for line in lines:
fout.write(line[1:])
fout.close()
fin.close()
os.rename(filename + '.tmp', filename)
(Note that this version will not work as-is on Windows, but it's simpler than the actual cross-platform version; if you need Windows, I can explain how to do this.) (请注意,此版本在Windows上不能正常工作,但它比实际的跨平台版本更简单;如果您需要Windows,我可以解释如何执行此操作。)
You can make the code a lot simpler, more robust, and more efficient by using with
statements, looping directly over the file instead of calling readlines
, and using tempfile
: 通过使用
with
语句,直接在文件上循环而不是调用readlines
,并使用tempfile
,可以使代码更简单,更健壮,更高效:
import tempfile
from sys import argv
run, filename = argv
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
os.rename(fout.name, filename)
On most platforms, this guarantees an "atomic write"—when your script finishes, or even if someone pulls the plug in the middle of it running, the file will end up either replaced by the new version, or untouched; 在大多数平台上,这可以保证“原子写入” - 当您的脚本完成时,或者即使有人在其运行过程中拔出插件,该文件最终将被新版本替换,或者不受影响; there's no way it can end up half-way overwritten into unrecoverable garbage.
它不可能最终被中途覆盖到不可恢复的垃圾中。
Again this version won't work on Windows. 此版本再次无法在Windows上运行。 Without a whole lot of work, there is no way to implement this "write-temp-and-rename" algorithm on Windows.
如果没有大量的工作,就无法在Windows上实现这种“写 - 临时和重命名”算法。 But you can come close with only a bit of extra work:
但是你可以通过一些额外的工作来接近:
with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
for line in fin:
fout.write(line[1:])
outname = fout.name
os.remove(filename)
os.rename(outname, filename)
This does prevent you from half-overwriting the file, but it leaves a hole where you may have deleted the original file, and left the new file in a temporary location that you'll have to search for. 这确实可以防止您半覆盖文件,但是它会留下一个漏洞,您可能已经删除了原始文件,并将新文件保留在您必须搜索的临时位置。 You can make this a little nicer by putting the file somewhere easier to find (see the
NamedTemporaryFile
docs to see how). 您可以通过将文件放在更容易找到的位置来使这更好一些(请参阅
NamedTemporaryFile
文档以了解如何)。 Or renaming the original file to a temporary name, then writing to the original filename, then deleting the original file. 或者将原始文件重命名为临时名称,然后写入原始文件名,然后删除原始文件。 Or various other possibilities.
或其他各种可能性。 But to actually get the same behavior as on other platforms is very difficult.
但实际上获得与其他平台相同的行为是非常困难的。
You can either read all lines in memory then recreate file, 你可以读取内存中的所有行然后重新创建文件,
from sys import argv
run, filename = argv
with open(filename, 'r') as f:
data = [i[1:] for i in f
with open(filename, 'w') as f:
f.writelines(i+'\n' for i in data) # this is for linux. for win use \r\n
or You can create other file and move data from first file to second line by line. 或者您可以创建其他文件并将数据从第一个文件移动到第二个行。 Then You can rename it If You'd like
然后你可以重命名它如果你愿意
from sys import argv
run, filename = argv
new_name = filename + '.tmp'
with open(filename, 'r') as f_in, open(new_name, 'w') as f_out:
for line in f_in:
f_out.write(line[1:])
os.rename(new_name, filename)
At its most basic, your problem is that you need to seek
back to the beginning of the file after you read its complete contents into the array f
. 在最基本的,你的问题是,你需要
seek
你读它的全部内容入阵后回文件的开头f
。 Since you are making the file shorter, you also need to use truncate
to adjust the official length of the file after you're done. 由于您要缩短文件,因此在完成后还需要使用
truncate
来调整文件的官方长度。 Furthermore, open mode a+
(a is for append ) overrides seek
and forces all writes to go to the end of the file. 此外,打开模式
a+
(a用于追加 )覆盖seek
并强制所有写入到文件的末尾。 So your code should look something like this: 所以你的代码应该是这样的:
import sys
def main(argv):
filename = argv[1]
with open(filename, 'r+') as f:
lines = f.readlines()
f.seek(0)
for line in lines:
f.write(line[1:])
f.truncate()
if __name__ == '__main__': main(sys.argv)
It is better , when doing something like this, to write the changes to a new file and then rename it over the old file when you're done. 在做这样的事情时, 最好将更改写入新文件,然后在完成后将其重命名为旧文件。 This causes the update to happen "atomically" - a concurrent reader sees either the old file or the new one, not some mangled combination of the two.
这导致更新“原子地”发生 - 并发读者看到旧文件或新文件,而不是两者的一些错位组合。 That looks like this:
看起来像这样:
import os
import sys
import tempfile
def main(argv):
filename = argv[1]
with open(filename, 'r') as inf:
with tempfile.NamedTemporaryFile(dir=".", delete=False) as outf:
tname = outf.name
for line in inf:
outf.write(line[1:])
os.rename(tname, filename)
if __name__ == '__main__': main(sys.argv)
(Note: Atomically replacing a file via rename
does not work on Windows; you have to os.remove
the old name first. This unfortunately does mean there is a brief window (no pun intended) where a concurrent reader will find that the file does not exist. As far as I know there is no way to avoid this.) (注意:通过
rename
原子替换文件在Windows上不起作用;你必须首先os.remove
旧名称。不幸的是,这意味着有一个简短的窗口(没有双关语),并发读者会发现该文件没有不存在。据我所知,没有办法避免这种情况。)
import re
with open(filename,'r+') as f:
modified = re.sub('^.','',f.read(),flags=re.MULTILINE)
f.seek(0,0)
f.write(modified)
In the regex pattern: 在正则表达式模式中:
^
means 'start of string' ^
表示'字符串的开头'
^
with flag re.MULTILINE
means 'start of line' ^
与标志re.MULTILINE
意味着'开始行'
^.
means 'the only one character at the start of a line' 意思是'一行开头唯一的一个字符'
The start of a line is the start of the string or any position after a newline (a newline is \\n
) 一行的开头是字符串的开头或换行符后的任何位置(换行符为
\\n
)
So, we may fear that some newlines in sequences like \\n\\n\\n\\n\\n\\n\\n
could match with the regex pattern. 因此,我们可能会担心像
\\n\\n\\n\\n\\n\\n\\n
序列中的某些换行符可能与正则表达式模式匹配。
But the dot symbolizes any character EXCEPT a newline, then all the newlines don't match with this regex pattern. 但点代表任何字符,除了换行符,然后所有换行符都与此正则表达式模式不匹配。
During the reading of the file triggered by f.read()
, the file's pointer goes until the end of the file. 在读取由
f.read()
触发的文件期间,文件的指针一直持续到文件末尾。
f.seek(0,0)
moves the file's pointer back to the beginning of the file f.seek(0,0)
将文件的指针移回文件的开头
f.truncate()
puts a new EOF = end of file at the point where the writing has stopped. f.truncate()
在写入停止的位置放置一个新的EOF =文件结尾。 It's necessary since the modified text is shorter than the original one. 因为修改后的文本比原始文本短,所以这是必要的。
Compare what it does with a code without this line 比较它没有这一行的代码
To be hones, i'm really not sure how good/bad is an idea of nesting with open()
, but you can do something like this. 要成为hones,我真的不确定
with open()
嵌套的想法有多好/坏,但你可以做这样的事情。
with open(filename_you_reading_lines_FROM, 'r') as f0:
with open(filename_you_appending_modified_lines_TO, 'a') as f1:
for line in f0:
f1.write(line[1:])
While there seemed to be some discussion of best practice and whether it would run on Windows or not, being new to Python, I was able to run the first example that worked and get it to run in my Win environment that has cygwin binaries in my environmental variables Path and remove the first 3 characters (which were line numbers from a sample file): 虽然似乎有一些关于最佳实践的讨论,以及它是否可以在Windows上运行,不熟悉Python,但我能够运行第一个有效的示例并让它在我的Win环境中运行,该环境中包含cygwin二进制文件环境变量路径并删除前3个字符(这是样本文件中的行号):
import os
from sys import argv
run, filename = argv
fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = fin.readlines()
for line in lines:
fout.write(line[3:])
fout.close()
fin.close()
I chose not to automatically overwrite since I wanted to be able to eyeball the output. 我选择不自动覆盖,因为我希望能够注意输出。
python c:\bin\remove1st3.py sampleCode.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.