简体   繁体   English

删除以唯一编号开头的行

[英]Delete lines starting with a unique number

I'm learning Python and created this program, but it won't work and I'm hoping someone can find the error! 我正在学习Python并创建了这个程序,但它不起作用,我希望有人能找到错误!

I have a file that has entries like this: 我有一个包含这样的条目的文件:

0 Kurthia sibirica Planococcaceae   
1593 Lactobacillus hordei Lactobacillaceae   
1121 Lactobacillus coleohominis Lactobacillaceae   
614 Lactobacillus coryniformis Lactobacillaceae   
57 Lactobacillus kitasatonis Lactobacillaceae   
3909 Lactobacillus malefermentans Lactobacillaceae

My goal is to remove all the lines that start with a number that only occurs once in the whole file (unique numbers), and save all the lines that start with number occurring twice or more to a new file. 我的目标是删除以在整个文件中只出现一次的数字开头的所有行(唯一数字),并将以数字开头的所有行保存两次或更多到新文件。 This is my attempt. 这是我的尝试。 It doesn't work yet (when I let the print line work, one line from the whole files repeated 3 times and that's it): 它还没有工作(当我让print线工作时,整个文件中的一行重复了3次,就是这样):

#!/usr/bin/env python

infilename = 'v35.clusternum.species.txt'
outfilename = 'v13clusters.no.singletons.txt'

#remove extra letters and spaces
x = 0
with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
        for clu in line:
                if clu.count > 1:
                        #print line
                        outfile.write(line)
                else:
                    x += 1
print("Number of Singletons:")
print(x)

Thanks for any help! 谢谢你的帮助!

Okay, your code is kind of headed in the right direction, but you have a few things decidedly confused. 好吧,你的代码有点朝着正确的方向前进,但你有一些事情显然很混乱。

You need to separate what your script is doing into two logical steps: one, aggregating (counting) all of the clu fields. 您需要将脚本执行的操作分为两个逻辑步骤:一,聚合(计算)所有clu字段。 Two, writing each field that has a clu count of > 1. You tried to do these steps together at the same time and.. well, it didn't work. 二,编写每个字段的clu计数> 1.您试图同时一起执行这些步骤,并且......好吧,它没有用。 You can technically do it that way, but you have the syntax wrong. 从技术上讲,您可以这样做,但语法错误。 It's also terribly inefficient to continuously search through your file for stuff. 连续搜索文件以获取内容也非常低效。 Best to only do it once or twice. 最好只做一次或两次。

So, let's separate the steps. 所以,让我们分开步骤。 First, count up your clu fields. 首先,计算你的clu字段。 The collections module has a Counter that you can use. collections模块有一个可以使用的Counter

from collections import Counter
with open(infilename, 'r') as infile:
    c = Counter(line.split()[0] for line in infile)

c is now a Counter that you can use to look up the count of a given clu . c现在是一个Counter ,您可以使用它来查找给定clu的计数。

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
                if c[clu] > 1:
                    outfile.write(line)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找以范围内的数字开头的行 - Find lines starting with a number in range 如何删除空行和从“/publications”开始的行? - how I can delete empty lines and lines starting from '/publications'? 获取以 H 开头的总字数和行数的代码 - Code to get total number of words and number of lines starting with H 如何删除以python中的特定单词开头的重复行 - How to delete repeating lines starting with specific word in python 查找以数字开头的多个匹配项,然后是特定单词的唯一出现 - Find multiple matches starting with number followed by unique occurrence of specific words 如何生成一个不以 0 开头且具有唯一数字的随机 4 位数字? - How to generate a random 4 digit number not starting with 0 and having unique digits? 如何给从0开始到随机给定数字的唯一integer id? - How to give unique integer id starting from 0 to random given number? Plotly:如何增加 colors 的数量以确保所有线路的唯一 colors? - Plotly: How to increase the number of colors to assure unique colors for all lines? 以“#”开头的行抛出索引错误,即使我明确不通过异常索引这些行。 如果我删除 # 行,有效 - Lines starting with '#' are throwing indexing error, even though I explicitly don't index those lines by exception. If I delete the # lines, works 保留具有一组唯一值的行并删除 rest - keep rows with a set number of unique values and delete the rest
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM