简体   繁体   English

按列中的值重复文本行?

[英]Repeat row of text by value in column?

I have such input (simplified to 4 columns; real data is quite huge) in tab delimited txt file: 我在制表符分隔的txt文件中有这样的输入(简化为4列;实际数据非常大):

FACTOR→NAME→SURNAME→ADDRESS
1→John→Smith→Chicago
3→Betty→Crawford→New York
2→Tom→Jonson→Chicago

And I want to get this: 我想得到这个:

FACTOR→NAME→SURNAME→ADDRESS
1→John→Smith→Chicago
3→Betty→Crawford→New York
3→Betty→Crawford→New York
3→Betty→Crawford→New York
2→Tom→Jonson→Chicago
2→Tom→Jonson→Chicago

In other words: I want to repeat each row based on value in FACTOR column. 换句话说:我想根据FACTOR列中的值重复每一行。 This value can have only number > 0 . 该值只能具有number > 0 How can I do this in Python? 如何在Python中执行此操作?

You can create an output.txt file for this, and considering the above mentioned lines as part of a input.txt , you can do something like: 您可以为此创建一个output.txt文件,并将上述几行作为input.txt一部分,您可以执行以下操作:

inp = open('input.txt', 'r')
lines = inp.readlines()
inp.close()

out = open('output.txt', 'w')
for line in lines:
    factor = int(line.split('\t')[0]) #splits on tab and gets the leftmost value, converts it to int
    out.write(line+'\n'*factor) #this writes the line desired number of times
out.close()

With script provided by Sam Chats I got this error: ValueError: invalid literal for int() with base 10: 'FACTOR' I guess it was because column name FACTOR was also used in loop as multiplying factor. 使用Sam Chats提供的脚本,我得到了以下错误:ValueError:int()的无效文字,基数为10:'FACTOR'我猜这是因为列名FACTOR也在循环中用作乘数。 I did some modifications to script and I got what I needed: 我对脚本做了一些修改,然后得到了所需的内容:

inp = open('input.txt', 'r')
out = open('output.txt', 'w')
header = inp.readline()
lines = inp.readlines()[0:]

out.write(header)

for line in lines:
    factor = int(line.split('\t')[0])
    out.write(line*factor)

inp.close()
out.close()

Thanks for the tips! 感谢您的提示!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM