简体   繁体   English

从txt文件中删除列

[英]Removing columns from a txt file

I'm a beginner in Python and I'm a bit stuck on a trivial problem. 我是Python的初学者,但是我对一个琐碎的问题有些困惑。 I would like to remove some columns and strings from a text file. 我想从文本文件中删除一些列和字符串。 It is tab separated. 它是制表符分隔的。 The first file is called A.txt 第一个文件称为A.txt

chr1_1792868_SNP    Bcin01g04980    NON_SYNONYMOUS  NON_SYNONYMOUS[T](gene:Bcin01g04980|transcript:Bcin01g04980.1|P->S:225) C   T   C/C C/C C/C C/C C/C C/T 234 233 232 219 233 221 234 233 232 219 233 23  0   0   0   0   0   198

And the output file (let's call it B.txt) should be like : 输出文件(我们称其为B.txt)应类似于:

1   1792868 Bcin01g04980    C   T   C/C C/C C/C C/C C/C C/T 234 233 232 219 233 221 234 233 232 219 233 23  0   0   0   0   0   198

So it would be to perform different operations: 因此将执行不同的操作:

  • Remove the "chr" and "_" strings of the first column 删除第一列的“ chr”和“ _”字符串
  • Split the 1 after "chr" and the number after in 2 different columns 将“ chr”后的1和后面的数字分为2个不同的列
  • Remove all the columns 3 and 4 删除所有第3列和第4列

I tried so far to do : 我到目前为止已经尝试做:

with  open ('A.txt', 'r') as mutmut_mutants:
        dble_mut = csv.reader(mutmut_mutants, delimiter='\t')
        with open('B.txt', 'w+') as mutants_coo:
            mut_coo= csv.writer(mutants_coo)
            for i in dble_mut:
                del i[2]
                del i[3]
                mut_coov.writerow( i )

But, big surprise, it's not working. 但是,令人惊讶的是,它不起作用。 And I'm not splitting the first string in 2 columns. 而且我没有将第一个字符串分成两列。 Anyone has an idea on how to proceed? 有人对如何进行有任何想法吗?

Thanks a lot! 非常感谢!

You can try this: 您可以尝试以下方法:

f = open('data.txt').readlines()

f = [i.strip('\n').split() for i in f]

new_data = []

for i in f:
    data1 = i[0].split("_")
    new = data1[0][-1]+" "+data1[1]+" "

    new += i[1]+" "

    new += ' '.join(i[4:])

    new_data.append(new)
print new_data[0]

Output: 输出:

'1 1792868 Bcin01g04980 C T C/C C/C C/C C/C C/C C/T 234 233 232 219 233 221 234 233 232 219 233 23 0 0 0 0 0 198'

Possible solution: 可能的解决方案:

with open('A.txt', 'r') as f:
    data=f.read()

columns = data.split('\t')
result = []

temp = columns[0].split('_')
result.append(temp[0][-1])
result.append(temp[1])

result.extend(columns[4:])

print result

So thanks to the code provided above (thank @Ajax1234 and @doctorlove ), I managed to have what I want in a list. 因此,由于上面提供的代码(感谢@ Ajax1234和@doctorlove),我设法在列表中找到了想要的内容。 I have some trouble saving it properly into a file. 我在将其正确保存到文件时遇到了一些麻烦。 I want it to be tab delimited and with each element of the list as a new line. 我希望它用制表符分隔,并且列表的每个元素都换行。 The code is 该代码是

f = open('mutmut_mutants.txt').readlines()

f = [i.strip('\n').split() for i in f]

new_data = []

for i in f:
    data1 = i[0].split("_")
    new = data1[0][-1]+" "+data1[1]+" "

    new += i[1]+" "

    new += ' '.join(i[4:])

    new_data.append(new)
print new_data

outfile = open("test.txt", "w")
print >> outfile, "\t".join(str(i) for i in new_data)
outfile.close()

My new_data list looks like that : 我的new_data列表如下所示:

['1 1792868 Bcin01g04980 CTC/CC/CC/CC/CC/CC/T 234 233 232 219 233 221 234 233 232 219 233 23 0 0 0 0 0 198', '1 1792869 Bcin01g04980 CTC/CC/TC/TC/TC/TC/T 240 236 233 220 232 220 240 96 66 80 30 25 0 140 166 140 202 194', '2 19718 Bcin02g00005 CAC/AC/AC/AC/AC/AC/A 86 51 78 84 87 108 63 38 58 60 63 86 22 13 20 24 24 22','....','....','...'] ['1 1792868 Bcin01g04980 CTC / CC / CC / CC / CC / CC / T 234233232219233123221234233232232219233 23 0 0 0 0 0 0 198','1 1792869 Bcin01g04980 CTC / CC / TC / TC / TC / TC / T 240236233220232220240 96 66 80 30 25 0 140 166140202 194','2 19718 Bcin02g00005 CAC / AC / AC / AC / AC / AC / AC / AC 86 51 78 84 87 108 63 38 58 60 63 86 22 13 20 24 24 22','....','....','...']

And the output in the text file looks ok except that there's not new line at the end of each element of the list : 文本文件中的输出看起来还可以,只是列表的每个元素的末尾没有换行:

1 1792868 Bcin01g04980 CTC/CC/CC/CC/CC/CC/T 234 233 232 219 233 221 234 233 232 219 233 23 0 0 0 0 0 198 1 1792869 Bcin01g04980 ... 1 1792868 Bcin01g04980 CTC / CC / CC / CC / CC / CC / T 234233232219219233221221234233232219233 23 0 0 0 0 0 0 198 1 1792869 Bcin01g04980 ...

Thanks for your help! 谢谢你的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM