简体   繁体   English

在Python中修改文本文件中的每一行

[英]Modifying each line in a text file in Python

I have a big file like below example: 我有一个大文件,例如以下示例:

1   10161   10166   3
1   10166   10172   2
1   10172   10182   1
1   10183   10192   1
1   10193   10199   1
1   10212   10248   1
1   10260   10296   1
1   11169   11205   1
1   11336   11372   1
2   11564   11586   2
2   11586   11587   3
2   11587   11600   4
3   11600   11622   2

I would like to add a "chr" at the beginning of each line, for example: 我想在每行的开头添加一个“ chr”,例如:

chr1    10161   10166   3
chr1    10166   10172   2
chr1    10172   10182   1
chr1    10183   10192   1
chr1    10193   10199   1
chr1    10212   10248   1
chr1    10260   10296   1
chr1    11169   11205   1
chr1    11336   11372   1
chr2    11564   11586   2
chr2    11586   11587   3
chr2    11587   11600   4
chr3    11600   11622   2

I tried the following code in python: 我在python中尝试了以下代码:

   file = open("myfile.bg", "r")
   for line in file: 
      newline = "chr" + line
   out = open("outfile.bg", "w")
   for new in newline:
      out.write("n"+new)

but did not return what I wanted. 但没有返回我想要的。 do you know how to fix the code for this purpose? 您知道如何为此目的修复代码吗?

The problem with your code is that you iterate over the input file without doing anything with the data you read: 代码的问题是,您遍历输入文件而不对读取的数据进行任何处理:

file = open("myfile.bg", "r")
for line in file: 
    newline = "chr" + line

the last line assigns each line in myfile.bg to the newline variable (a string, with 'chr' prepended), each line overwriting the previous result. 最后一行将myfile.bg中的每一行分配给newline变量(一个带有'chr'的字符串),每一行都覆盖先前的结果。

Then you iterate over the string in newline (which will be the last line in the input file, with 'chr' prepended): 然后,您遍历newline的字符串(这将是输入文件中的最后一行,并带有'chr'前缀):

out = open("outfile.bg", "w")
for new in newline:       # <== this iterates over a string, so `new` will be individual characters
    out.write("n"+new)    # this only writes 'n' before each character in newline

If you're just doing this once, eg in the shell, you could use the one-liner: 如果仅在外壳中执行一次,则可以使用单线:

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

more correct (especially in a program, where you would care about open file handles etc.) would be: 更正确(尤其是在程序中,您会关心打开文件句柄等的程序)将是:

with open('myfile.bg') as infp:
    lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
    outfp.writelines(['chr' + line for line in lines])

if the file is really big (close to the size of your available memory), you'll need to process it incrementally: 如果该文件是真正的大(接近可用内存的大小),你需要逐步处理它:

with open('myfile.bg') as infp:
    with open('outfile.bg', 'w') as outfp:
        for line in infp:
            outfp.write('chr' + line)

(this is much slower than the first two versions though..) (尽管这比前两个版本要慢得多。)

Totally agree with @rychaza, here's my version using your code 完全同意@rychaza,这是使用您的代码的我的版本

file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
    out.write("chr" + line)
out.close()
file.close()

The issue is you are iterating the input and re-setting the same variable ( newline ) for every line, then opening a file for output and iterating newline which is a string, so new will be each character in that string. 问题是您要遍历输入并为每行重新设置相同的变量( newline ),然后打开文件进行输出并遍历作为字符串的newline ,因此new将是该字符串中的每个字符。

I think something like this should be what you're looking for: 我认为您需要的是这样的东西:

with open('myfile.bg','rb') as file:
  with open('outfile.bg','wb') as out:
    for line in file:
      out.write('chr' + line)

When iterating a file, line should already contain the trailing newline. 迭代文件时,该line应已包含尾随换行符。

The with statements will automatically clean up the file handle when the block ends. 块结束时, with语句将自动清理文件句柄。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM