繁体   English   中英

在Python中修改文本文件中的每一行

[英]Modifying each line in a text file in Python

我有一个大文件,例如以下示例:

1   10161   10166   3
1   10166   10172   2
1   10172   10182   1
1   10183   10192   1
1   10193   10199   1
1   10212   10248   1
1   10260   10296   1
1   11169   11205   1
1   11336   11372   1
2   11564   11586   2
2   11586   11587   3
2   11587   11600   4
3   11600   11622   2

我想在每行的开头添加一个“ chr”,例如:

chr1    10161   10166   3
chr1    10166   10172   2
chr1    10172   10182   1
chr1    10183   10192   1
chr1    10193   10199   1
chr1    10212   10248   1
chr1    10260   10296   1
chr1    11169   11205   1
chr1    11336   11372   1
chr2    11564   11586   2
chr2    11586   11587   3
chr2    11587   11600   4
chr3    11600   11622   2

我在python中尝试了以下代码:

   file = open("myfile.bg", "r")
   for line in file: 
      newline = "chr" + line
   out = open("outfile.bg", "w")
   for new in newline:
      out.write("n"+new)

但没有返回我想要的。 您知道如何为此目的修复代码吗?

代码的问题是,您遍历输入文件而不对读取的数据进行任何处理:

file = open("myfile.bg", "r")
for line in file: 
    newline = "chr" + line

最后一行将myfile.bg中的每一行分配给newline变量(一个带有'chr'的字符串),每一行都覆盖先前的结果。

然后,您遍历newline的字符串(这将是输入文件中的最后一行,并带有'chr'前缀):

out = open("outfile.bg", "w")
for new in newline:       # <== this iterates over a string, so `new` will be individual characters
    out.write("n"+new)    # this only writes 'n' before each character in newline

如果仅在外壳中执行一次,则可以使用单线:

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

更正确(尤其是在程序中,您会关心打开文件句柄等的程序)将是:

with open('myfile.bg') as infp:
    lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
    outfp.writelines(['chr' + line for line in lines])

如果该文件是真正的大(接近可用内存的大小),你需要逐步处理它:

with open('myfile.bg') as infp:
    with open('outfile.bg', 'w') as outfp:
        for line in infp:
            outfp.write('chr' + line)

(尽管这比前两个版本要慢得多。)

完全同意@rychaza,这是使用您的代码的我的版本

file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
    out.write("chr" + line)
out.close()
file.close()

问题是您要遍历输入并为每行重新设置相同的变量( newline ),然后打开文件进行输出并遍历作为字符串的newline ,因此new将是该字符串中的每个字符。

我认为您需要的是这样的东西:

with open('myfile.bg','rb') as file:
  with open('outfile.bg','wb') as out:
    for line in file:
      out.write('chr' + line)

迭代文件时,该line应已包含尾随换行符。

块结束时, with语句将自动清理文件句柄。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM