[英]Repeat row of text by value in column?
I have such input (simplified to 4 columns; real data is quite huge) in tab delimited txt file: 我在制表符分隔的txt文件中有这样的输入(简化为4列;实际数据非常大):
FACTOR→NAME→SURNAME→ADDRESS
1→John→Smith→Chicago
3→Betty→Crawford→New York
2→Tom→Jonson→Chicago
And I want to get this: 我想得到这个:
FACTOR→NAME→SURNAME→ADDRESS
1→John→Smith→Chicago
3→Betty→Crawford→New York
3→Betty→Crawford→New York
3→Betty→Crawford→New York
2→Tom→Jonson→Chicago
2→Tom→Jonson→Chicago
In other words: I want to repeat each row based on value in FACTOR
column. 换句话说:我想根据FACTOR
列中的值重复每一行。 This value can have only number > 0
. 该值只能具有number > 0
。 How can I do this in Python? 如何在Python中执行此操作?
You can create an output.txt
file for this, and considering the above mentioned lines as part of a input.txt
, you can do something like: 您可以为此创建一个output.txt
文件,并将上述几行作为input.txt
一部分,您可以执行以下操作:
inp = open('input.txt', 'r')
lines = inp.readlines()
inp.close()
out = open('output.txt', 'w')
for line in lines:
factor = int(line.split('\t')[0]) #splits on tab and gets the leftmost value, converts it to int
out.write(line+'\n'*factor) #this writes the line desired number of times
out.close()
With script provided by Sam Chats I got this error: ValueError: invalid literal for int() with base 10: 'FACTOR' I guess it was because column name FACTOR was also used in loop as multiplying factor. 使用Sam Chats提供的脚本,我得到了以下错误:ValueError:int()的无效文字,基数为10:'FACTOR'我猜这是因为列名FACTOR也在循环中用作乘数。 I did some modifications to script and I got what I needed: 我对脚本做了一些修改,然后得到了所需的内容:
inp = open('input.txt', 'r')
out = open('output.txt', 'w')
header = inp.readline()
lines = inp.readlines()[0:]
out.write(header)
for line in lines:
factor = int(line.split('\t')[0])
out.write(line*factor)
inp.close()
out.close()
Thanks for the tips! 感谢您的提示!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.