简体   繁体   English

从文本文件中的每一行中删除第一个字符

[英]remove first char from each line in a text file

im new to Python, to programming in general. 我是Python新手,一般是编程。

I want to remove first char from each line in a text file and write the changes back to the file. 我想从文本文件中的每一行中删除第一个字符,并将更改写回文件。 For example i have file with 36 lines, and the first char in each line contains a symbol or a number, and i want it to be removed. 例如,我有36行的文件,每行中的第一个字符包含一个符号或数字,我希望它被删除。

I made a little code here, but it doesn't work as expected, it only duplicates whole liens. 我在这里制作了一些代码,但它没有按预期工作,它只重复整个留置权。 Any help would be appreciated in advance! 任何帮助将提前感谢!

from sys import argv

run, filename = argv

f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
    f.write(line[1:])
f.close()

Your code already does remove the first character. 您的代码已经删除的第一个字符。 I saved exactly your code as both dupy.py and dupy.txt , then ran python dupy.py dupy.txt , and the result is: 我将你的代码保存为dupy.pydupy.txt ,然后运行python dupy.py dupy.txt ,结果是:

from sys import argv

run, filename = argv

f = open(filename, 'a+')
f.seek(0)
lines = f.readlines()
for line in lines:
    f.write(line[1:])
f.close()
rom sys import argv
un, filename = argv
 = open(filename, 'a+')
.seek(0)
ines = f.readlines()
or line in lines:
   f.write(line[1:])
.close()

It's not copying entire lines; 它不是复制整行; it's copying lines with their first character stripped. 它是在第一个字符被剥离的情况下复制行。


But from the initial statement of your problem, it sounds like you want to overwrite the lines, not append new copies. 但是从你的问题的最初陈述,听起来你想要覆盖线条,而不是附加新的副本。 To do that, don't use append mode. 为此,请勿使用append模式。 Read the file, then write it: 读取文件,然后写下:

from sys import argv

run, filename = argv

f = open(filename)
lines = f.readlines()
f.close()
f = open(filename, 'w')
for line in lines:
    f.write(line[1:])
f.close()

Or, alternatively, write a new file, then move it on top of the original when you're done: 或者,或者,写一个新文件,然后在完成后将其移到原始文件之上:

import os
from sys import argv

run, filename = argv

fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = f.readlines()
for line in lines:
    fout.write(line[1:])
fout.close()
fin.close()
os.rename(filename + '.tmp', filename)

(Note that this version will not work as-is on Windows, but it's simpler than the actual cross-platform version; if you need Windows, I can explain how to do this.) (请注意,此版本在Windows上不能正常工作,但它比实际的跨平台版本更简单;如果您需要Windows,我可以解释如何执行此操作。)


You can make the code a lot simpler, more robust, and more efficient by using with statements, looping directly over the file instead of calling readlines , and using tempfile : 通过使用with语句,直接在文件上循环而不是调用readlines ,并使用tempfile ,可以使代码更简单,更健壮,更高效:

import tempfile
from sys import argv

run, filename = argv

with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
    for line in fin:
        fout.write(line[1:])
    os.rename(fout.name, filename)

On most platforms, this guarantees an "atomic write"—when your script finishes, or even if someone pulls the plug in the middle of it running, the file will end up either replaced by the new version, or untouched; 在大多数平台上,这可以保证“原子写入” - 当您的脚本完成时,或者即使有人在其运行过程中拔出插件,该文件最终将被新版本替换,或者不受影响; there's no way it can end up half-way overwritten into unrecoverable garbage. 它不可能最终被中途覆盖到不可恢复的垃圾中。

Again this version won't work on Windows. 此版本再次无法在Windows上运行。 Without a whole lot of work, there is no way to implement this "write-temp-and-rename" algorithm on Windows. 如果没有大量的工作,就无法在Windows上实现这种“写 - 临时和重命名”算法。 But you can come close with only a bit of extra work: 但是你可以通过一些额外的工作来接近:

with open(filename) as fin, tempfile.NamedTemporaryFile(delete=False) as fout:
    for line in fin:
        fout.write(line[1:])
    outname = fout.name
os.remove(filename)
os.rename(outname, filename)

This does prevent you from half-overwriting the file, but it leaves a hole where you may have deleted the original file, and left the new file in a temporary location that you'll have to search for. 这确实可以防止您半覆盖文件,但是它会留下一个漏洞,您可能已经删除了原始文件,并将新文件保留在您必须搜索的临时位置。 You can make this a little nicer by putting the file somewhere easier to find (see the NamedTemporaryFile docs to see how). 您可以通过将文件放在更容易找到的位置来使这更好一些(请参阅NamedTemporaryFile文档以了解如何)。 Or renaming the original file to a temporary name, then writing to the original filename, then deleting the original file. 或者将原始文件重命名为临时名称,然后写入原始文件名,然后删除原始文件。 Or various other possibilities. 或其他各种可能性。 But to actually get the same behavior as on other platforms is very difficult. 但实际上获得与其他平台相同的行为是非常困难的。

You can either read all lines in memory then recreate file, 你可以读取内存中的所有行然后重新创建文件,

from sys import argv

run, filename = argv

with open(filename, 'r') as f:
    data = [i[1:] for i in f
with open(filename, 'w') as f:
    f.writelines(i+'\n' for i in data) # this is for linux. for win use \r\n

or You can create other file and move data from first file to second line by line. 或者您可以创建其他文件并将数据从第一个文件移动到第二个行。 Then You can rename it If You'd like 然后你可以重命名它如果你愿意

from sys import argv

run, filename = argv

new_name = filename + '.tmp'
with open(filename, 'r') as f_in, open(new_name, 'w') as f_out:
    for line in f_in:
        f_out.write(line[1:])

os.rename(new_name, filename)

At its most basic, your problem is that you need to seek back to the beginning of the file after you read its complete contents into the array f . 在最基本的,你的问题是,你需要seek你读它的全部内容入阵回文件的开头f Since you are making the file shorter, you also need to use truncate to adjust the official length of the file after you're done. 由于您要缩短文件,因此在完成后还需要使用truncate来调整文件的官方长度。 Furthermore, open mode a+ (a is for append ) overrides seek and forces all writes to go to the end of the file. 此外,打开模式a+ (a用于追加 )覆盖seek并强制所有写入到文件的末尾。 So your code should look something like this: 所以你的代码应该是这样的:

import sys

def main(argv):
    filename = argv[1]
    with open(filename, 'r+') as f:
        lines = f.readlines()
        f.seek(0)
        for line in lines:
            f.write(line[1:])
        f.truncate()

if __name__ == '__main__': main(sys.argv)

It is better , when doing something like this, to write the changes to a new file and then rename it over the old file when you're done. 在做这样的事情时, 最好将更改写入文件,然后在完成后将其重命名为旧文件。 This causes the update to happen "atomically" - a concurrent reader sees either the old file or the new one, not some mangled combination of the two. 这导致更新“原子地”发生 - 并发读者看到旧文件或新文件,而不是两者的一些错位组合。 That looks like this: 看起来像这样:

import os
import sys
import tempfile

def main(argv):
    filename = argv[1]
    with open(filename, 'r') as inf:
        with tempfile.NamedTemporaryFile(dir=".", delete=False) as outf:
            tname = outf.name
            for line in inf:
                outf.write(line[1:])
    os.rename(tname, filename)

if __name__ == '__main__': main(sys.argv)

(Note: Atomically replacing a file via rename does not work on Windows; you have to os.remove the old name first. This unfortunately does mean there is a brief window (no pun intended) where a concurrent reader will find that the file does not exist. As far as I know there is no way to avoid this.) (注意:通过rename原子替换文件在Windows上不起作用;你必须首先os.remove旧名称。不幸的是,这意味着有一个简短的窗口(没有双关语),并发读者会发现该文件没有不存在。据我所知,没有办法避免这种情况。)

import re

with open(filename,'r+') as f:
    modified = re.sub('^.','',f.read(),flags=re.MULTILINE)
    f.seek(0,0)
    f.write(modified)

In the regex pattern: 在正则表达式模式中:
^ means 'start of string' ^表示'字符串的开头'
^ with flag re.MULTILINE means 'start of line' ^与标志re.MULTILINE意味着'开始行'

^. means 'the only one character at the start of a line' 意思是'一行开头唯一的一个字符'

The start of a line is the start of the string or any position after a newline (a newline is \\n ) 一行的开头是字符串的开头或换行符后的任何位置(换行符为\\n
So, we may fear that some newlines in sequences like \\n\\n\\n\\n\\n\\n\\n could match with the regex pattern. 因此,我们可能会担心像\\n\\n\\n\\n\\n\\n\\n序列中的某些换行符可能与正则表达式模式匹配。
But the dot symbolizes any character EXCEPT a newline, then all the newlines don't match with this regex pattern. 但点代表任何字符,除了换行符,然后所有换行符都与此正则表达式模式不匹配。

During the reading of the file triggered by f.read() , the file's pointer goes until the end of the file. 在读取由f.read()触发的文件期间,文件的指针一直持续到文件末尾。

f.seek(0,0) moves the file's pointer back to the beginning of the file f.seek(0,0)将文件的指针移回文件的开头

f.truncate() puts a new EOF = end of file at the point where the writing has stopped. f.truncate()在写入停止的位置放置一个新的EOF =文件结尾。 It's necessary since the modified text is shorter than the original one. 因为修改后的文本比原始文本短,所以这是必要的。
Compare what it does with a code without this line 比较它没有这一行的代码

To be hones, i'm really not sure how good/bad is an idea of nesting with open() , but you can do something like this. 要成为hones,我真的不确定with open()嵌套的想法有多好/坏,但你可以做这样的事情。

with open(filename_you_reading_lines_FROM, 'r') as f0:
    with open(filename_you_appending_modified_lines_TO, 'a') as f1:
        for line in f0:
            f1.write(line[1:])

While there seemed to be some discussion of best practice and whether it would run on Windows or not, being new to Python, I was able to run the first example that worked and get it to run in my Win environment that has cygwin binaries in my environmental variables Path and remove the first 3 characters (which were line numbers from a sample file): 虽然似乎有一些关于最佳实践的讨论,以及它是否可以在Windows上运行,不熟悉Python,但我能够运行第一个有效的示例并让它在我的Win环境中运行,该环境中包含cygwin二进制文件环境变量路径并删除前3个字符(这是样本文件中的行号):

import os
from sys import argv

run, filename = argv

fin = open(filename)
fout = open(filename + '.tmp', 'w')
lines = fin.readlines()
for line in lines:
    fout.write(line[3:])
fout.close()
fin.close()

I chose not to automatically overwrite since I wanted to be able to eyeball the output. 我选择不自动覆盖,因为我希望能够注意输出。

python c:\bin\remove1st3.py sampleCode.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从每行的第一个字符解析文件 - Parsing a file from first char in each line 如何从文件的每一行中删除第一个和最后 n 个字符 - How to remove first and last n characters from each line of a file 从文本文件的每一行中删除表达式“dict_values([”并从每一行中删除关闭“])” - Remove an expression "dict_values([" from each line of a text file and remove closing "])" from each line 仅使用第一个索引从文本文件中删除一行 - Remove a line from a text file using only the first index 使用正则表达式从文本文件的每一行中删除子字符串 - Remove substring from each line of text file with regex 从每个文本文件中删除最后一个空行 - Remove the last empty line from each text file 如何从 Python 的文本文件中的每一行中删除模式 - How to remove a pattern from each line in a text file in Python 如何从每个json文件的第一行中删除前几个字符 - How to remove first few characters from every 1st line of each json file 在文本文件的每一行中的第一个逗号之前提取文本 - Extract text before the first comma in each line of a text file 如何使用每个新段落的第一行中的键从按段落分隔的文本文件在python中制作字典? - How to make a dictionary in python from a text file seperated by paragraph with the key in the first line of each new paragraph?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM