简体   繁体   English

过滤txt文件是否满足python中的某些条件?

[英]Filter the txt file satisfies some condition in python?

I have a txt file contained the subjectid_num_[dog/cat]_[option] . 我有一个包含subjectid_num_ [dog / cat] _ [option]的txt文件。

ID1_0123_CAT_ANIMAL_3
ID1_0123_CAT_ANIMAL_GOOD_3
ID1_0123_ABC_3
ID2_1234_CAT_ANIMAL_3
ID2_1234_CAT_ANIMAL_GOOD_3
ID2_1234_DOG_ANIMAL_2
ID2_1234_DOG_ANIMAL_GOOD_0
ID2_1234_ABCD_3
ID3_4321_DOG_ANIMAL_1
ID3_4321_DOG_ANIMAL_GOOD_4
ID3_4321_DOG_3

I want to filter the file to get the output that satisfies the condition. 我想过滤文件以获得满足条件的输出。 For example, the below code will file the output has CAT and GOOD in the name, and does not contains the DOG and GOOD in the name. 例如,下面的代码将归档输出名称包含CATGOOD且名称中不包含DOGGOOD的文件。 The name is determined by same subject_id and same number num . 名称由相同的subject_id和相同的数字num However, the code does not show my expected output. 但是,该代码未显示我的预期输出。 How should I fix it? 我应该如何解决?

This is my code 这是我的代码

with open("./cat_dog.txt", 'r') as f:
    files_list = [line.rstrip('\n') for line in f]
file_filter = []
for i, cat in enumerate(files_list):
    if 'GOOD' in cat and 'CAT' in cat:
        subject_id = cat.split('_')[0]
        num_id = cat.split('_')[1]
        subject_num = subject_id + '_' + num_id
        for j, dog in enumerate(files_list):
                if subject_num in dog and 'GOOD' in dog:
                    if 'GOOD' in dog and 'DOG' in dog:
                        continue;
                    else:
                        file_filter.append(cat)

The current output is 当前输出是

ID1_0123_CAT_ANIMAL_GOOD_3
ID2_1234_CAT_ANIMAL_GOOD_3

While the expected is 虽然预期是

ID1_0123_CAT_ANIMAL_GOOD_3

Your code is wrong. 您的代码是错误的。 Consider what happens when you check line ID2_1234_CAT_ANIMAL_GOOD_3 in the inner loop: 考虑一下在内部循环中检查ID2_1234_CAT_ANIMAL_GOOD_3会发生什么:

subject_id = cat.split('_')[0]            #ID2
num_id = cat.split('_')[1]                # 1234
subject_num = subject_id + '_' + num_id   #ID2_1234
for j, dog in enumerate(files_list):
        # when dog is the line ID2_1234_CAT_ANIMAL_GOOD_3
        if subject_num in dog and 'GOOD' in dog:   # this is true
            if 'GOOD' in dog and 'DOG' in dog:   # this is false
                continue;
            else:
                file_filter.append(cat)   # then it outputs it

The problem is that every line with GOOD and CAT in it will "match itself" in the inner loop. 问题在于,每条带有GOODCAT行都会在内部循环中“匹配”。

IMHO I'd use itertools.groupby . 恕我直言,我会使用itertools.groupby Something along the lines of: 类似于以下内容:

from itertools import groupby

def key(line):
    return line.split('_')[:2]

for key, lines in groupby(sorted(files_list, key=key), key=key):
    good_lines = [line for line in lines if 'GOOD' in line]
    if len(good_lines) == 1 and 'CAT' in good_lines[0]:
        file_filter.append(good_lines[0])

This should also be more efficient O(nlog n) vs O(n^2), although it needs all the contents of the file in RAM. 尽管O(nlog n)比O(n ^ 2)还需要RAM中文件的所有内容,但这也应该更有效。


If you have other "classes" other than CAT and DOG and you want to output all GOOD CAT lines except if the subject_id is also a GOOD DOG you can modify the code above in this way: 如果您有除CATDOG之外的其他“类”,并且要输出所有GOOD CAT行,除非subject_id也是GOOD DOG ,则可以按以下方式修改上面的代码:

is_good_cat = any('CAT' in line for line in good_lines)
is_good_dog = any('DOG' in line for line in good_lines)
if is_good_cat and not is_good_dog:
    file_filter.extend(line for line in good_lines if 'CAT' in good_lines)

(You need to use .extend and the loop because we no longer know which is the line to write, so you have to filter them. (您需要使用.extend和循环,因为我们不再知道要写的行,因此您必须对其进行过滤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM