[英]Delete lines starting with a unique number
I'm learning Python and created this program, but it won't work and I'm hoping someone can find the error! 我正在学习Python并创建了这个程序,但它不起作用,我希望有人能找到错误!
I have a file that has entries like this: 我有一个包含这样的条目的文件:
0 Kurthia sibirica Planococcaceae
1593 Lactobacillus hordei Lactobacillaceae
1121 Lactobacillus coleohominis Lactobacillaceae
614 Lactobacillus coryniformis Lactobacillaceae
57 Lactobacillus kitasatonis Lactobacillaceae
3909 Lactobacillus malefermentans Lactobacillaceae
My goal is to remove all the lines that start with a number that only occurs once in the whole file (unique numbers), and save all the lines that start with number occurring twice or more to a new file. 我的目标是删除以在整个文件中只出现一次的数字开头的所有行(唯一数字),并将以数字开头的所有行保存两次或更多到新文件。 This is my attempt. 这是我的尝试。 It doesn't work yet (when I let the print
line work, one line from the whole files repeated 3 times and that's it): 它还没有工作(当我让print
线工作时,整个文件中的一行重复了3次,就是这样):
#!/usr/bin/env python
infilename = 'v35.clusternum.species.txt'
outfilename = 'v13clusters.no.singletons.txt'
#remove extra letters and spaces
x = 0
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
for line in infile:
clu, gen, spec, fam = line.split()
for clu in line:
if clu.count > 1:
#print line
outfile.write(line)
else:
x += 1
print("Number of Singletons:")
print(x)
Thanks for any help! 谢谢你的帮助!
Okay, your code is kind of headed in the right direction, but you have a few things decidedly confused. 好吧,你的代码有点朝着正确的方向前进,但你有一些事情显然很混乱。
You need to separate what your script is doing into two logical steps: one, aggregating (counting) all of the clu
fields. 您需要将脚本执行的操作分为两个逻辑步骤:一,聚合(计算)所有clu
字段。 Two, writing each field that has a clu
count of > 1. You tried to do these steps together at the same time and.. well, it didn't work. 二,编写每个字段的clu
计数> 1.您试图同时一起执行这些步骤,并且......好吧,它没有用。 You can technically do it that way, but you have the syntax wrong. 从技术上讲,您可以这样做,但语法错误。 It's also terribly inefficient to continuously search through your file for stuff. 连续搜索文件以获取内容也非常低效。 Best to only do it once or twice. 最好只做一次或两次。
So, let's separate the steps. 所以,让我们分开步骤。 First, count up your clu
fields. 首先,计算你的clu
字段。 The collections
module has a Counter
that you can use. collections
模块有一个可以使用的Counter
。
from collections import Counter
with open(infilename, 'r') as infile:
c = Counter(line.split()[0] for line in infile)
c
is now a Counter
that you can use to look up the count of a given clu
. c
现在是一个Counter
,您可以使用它来查找给定clu
的计数。
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
for line in infile:
clu, gen, spec, fam = line.split()
if c[clu] > 1:
outfile.write(line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.