删除以唯一编号开头的行

Question

I'm learning Python and created this program, but it won't work and I'm hoping someone can find the error! 我正在学习Python并创建了这个程序，但它不起作用，我希望有人能找到错误！

I have a file that has entries like this: 我有一个包含这样的条目的文件：

0 Kurthia sibirica Planococcaceae   
1593 Lactobacillus hordei Lactobacillaceae   
1121 Lactobacillus coleohominis Lactobacillaceae   
614 Lactobacillus coryniformis Lactobacillaceae   
57 Lactobacillus kitasatonis Lactobacillaceae   
3909 Lactobacillus malefermentans Lactobacillaceae

My goal is to remove all the lines that start with a number that only occurs once in the whole file (unique numbers), and save all the lines that start with number occurring twice or more to a new file. 我的目标是删除以在整个文件中只出现一次的数字开头的所有行（唯一数字），并将以数字开头的所有行保存两次或更多到新文件。 This is my attempt. 这是我的尝试。 It doesn't work yet (when I let the print line work, one line from the whole files repeated 3 times and that's it): 它还没有工作（当我让print线工作时，整个文件中的一行重复了3次，就是这样）：

#!/usr/bin/env python

infilename = 'v35.clusternum.species.txt'
outfilename = 'v13clusters.no.singletons.txt'

#remove extra letters and spaces
x = 0
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
        for clu in line:
                if clu.count > 1:
                        #print line
                        outfile.write(line)
                else:
                    x += 1
print("Number of Singletons:")
print(x)

Thanks for any help! 谢谢你的帮助！

Answer 1

Okay, your code is kind of headed in the right direction, but you have a few things decidedly confused. 好吧，你的代码有点朝着正确的方向前进，但你有一些事情显然很混乱。

You need to separate what your script is doing into two logical steps: one, aggregating (counting) all of the clu fields. 您需要将脚本执行的操作分为两个逻辑步骤：一，聚合（计算）所有clu字段。 Two, writing each field that has a clu count of > 1. You tried to do these steps together at the same time and.. well, it didn't work. 二，编写每个字段的clu计数> 1.您试图同时一起执行这些步骤，并且......好吧，它没有用。 You can technically do it that way, but you have the syntax wrong. 从技术上讲，您可以这样做，但语法错误。 It's also terribly inefficient to continuously search through your file for stuff. 连续搜索文件以获取内容也非常低效。 Best to only do it once or twice. 最好只做一次或两次。

So, let's separate the steps. 所以，让我们分开步骤。 First, count up your clu fields. 首先，计算你的clu字段。 The collections module has a Counter that you can use. collections模块有一个可以使用的Counter 。

from collections import Counter
with open(infilename, 'r') as infile:
    c = Counter(line.split()[0] for line in infile)

c is now a Counter that you can use to look up the count of a given clu . c现在是一个Counter ，您可以使用它来查找给定clu的计数。

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
                if c[clu] > 1:
                    outfile.write(line)

删除以唯一编号开头的行

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-11-24 04:59:53

删除以唯一编号开头的行

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-11-24 04:59:53

解决方案1
2 已采纳 2013-11-24 04:59:53