[英]Modifying each line in a text file in Python
I have a big file like below example: 我有一个大文件,例如以下示例:
1 10161 10166 3
1 10166 10172 2
1 10172 10182 1
1 10183 10192 1
1 10193 10199 1
1 10212 10248 1
1 10260 10296 1
1 11169 11205 1
1 11336 11372 1
2 11564 11586 2
2 11586 11587 3
2 11587 11600 4
3 11600 11622 2
I would like to add a "chr" at the beginning of each line, for example: 我想在每行的开头添加一个“ chr”,例如:
chr1 10161 10166 3
chr1 10166 10172 2
chr1 10172 10182 1
chr1 10183 10192 1
chr1 10193 10199 1
chr1 10212 10248 1
chr1 10260 10296 1
chr1 11169 11205 1
chr1 11336 11372 1
chr2 11564 11586 2
chr2 11586 11587 3
chr2 11587 11600 4
chr3 11600 11622 2
I tried the following code in python: 我在python中尝试了以下代码:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
out = open("outfile.bg", "w")
for new in newline:
out.write("n"+new)
but did not return what I wanted. 但没有返回我想要的。 do you know how to fix the code for this purpose?
您知道如何为此目的修复代码吗?
The problem with your code is that you iterate over the input file without doing anything with the data you read: 代码的问题是,您遍历输入文件而不对读取的数据进行任何处理:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
the last line assigns each line in myfile.bg
to the newline
variable (a string, with 'chr'
prepended), each line overwriting the previous result. 最后一行将
myfile.bg
中的每一行分配给newline
变量(一个带有'chr'
的字符串),每一行都覆盖先前的结果。
Then you iterate over the string in newline
(which will be the last line in the input file, with 'chr'
prepended): 然后,您遍历
newline
的字符串(这将是输入文件中的最后一行,并带有'chr'
前缀):
out = open("outfile.bg", "w")
for new in newline: # <== this iterates over a string, so `new` will be individual characters
out.write("n"+new) # this only writes 'n' before each character in newline
If you're just doing this once, eg in the shell, you could use the one-liner: 如果仅在外壳中执行一次,则可以使用单线:
open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])
more correct (especially in a program, where you would care about open file handles etc.) would be: 更正确(尤其是在程序中,您会关心打开文件句柄等的程序)将是:
with open('myfile.bg') as infp:
lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
outfp.writelines(['chr' + line for line in lines])
if the file is really big (close to the size of your available memory), you'll need to process it incrementally: 如果该文件是真正的大(接近可用内存的大小),你需要逐步处理它:
with open('myfile.bg') as infp:
with open('outfile.bg', 'w') as outfp:
for line in infp:
outfp.write('chr' + line)
(this is much slower than the first two versions though..) (尽管这比前两个版本要慢得多。)
Totally agree with @rychaza, here's my version using your code 完全同意@rychaza,这是使用您的代码的我的版本
file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
out.write("chr" + line)
out.close()
file.close()
The issue is you are iterating the input and re-setting the same variable ( newline
) for every line, then opening a file for output and iterating newline
which is a string, so new
will be each character in that string. 问题是您要遍历输入并为每行重新设置相同的变量(
newline
),然后打开文件进行输出并遍历作为字符串的newline
,因此new
将是该字符串中的每个字符。
I think something like this should be what you're looking for: 我认为您需要的是这样的东西:
with open('myfile.bg','rb') as file:
with open('outfile.bg','wb') as out:
for line in file:
out.write('chr' + line)
When iterating a file, line
should already contain the trailing newline. 迭代文件时,该
line
应已包含尾随换行符。
The with
statements will automatically clean up the file handle when the block ends. 块结束时,
with
语句将自动清理文件句柄。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.