簡體   English   中英

在每一行和每一列中查找重復項

[英]Finding duplicates in each row and column

該功能需要能夠檢查文件中每一行和每一列是否重復。

重復文件示例:

A B C
A A B
B C A

如您所見,第2行中有2個A,但第1列中有兩個A。 碼:

def duplication_char(dc):
    with open (dc,"r") as duplicatechars: 
        linecheck = duplicatechar.readlines()
    linecheck = [line.split() for line in linecheck]

    for row in linecheck:
        if len(set(row)) != len(row):
            print ("duplicates", " ".join(row))


    for column in zip(*checkLine):
        if len(set(column)) != len(column):
            print ("duplicates"," ".join(column))

好吧,這就是我要怎么做。

首先,讀取文件並創建一個包含內容的2d numpy數組:

import numpy
with open('test.txt', 'r') as fil:
    lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)

然后,使用集合檢查每一行是否有重復項(一個集合沒有重復項,因此,如果集合的長度與數組的長度不同,則該數組有重復項):

for row in arr:
    if len(set(row)) != len(row):
        print 'Duplicates in row: ', row

然后,通過轉置numpy數組,檢查每個列是否具有使用集合的重復項:

for col in arr.T:
    if len(set(col)) != len(col):
        print 'Duplicates in column: ', col

如果將所有這些都包裝在一個函數中:

def check_for_duplicates(filename):
    import numpy
    with open(filename, 'r') as fil:
        lines = fil.readlines()
    lines = [line.strip().split() for line in lines]
    arr = numpy.array(lines)

    for row in arr:
        if len(set(row)) != len(row):
            print 'Duplicates in row: ', row

    for col in arr.T:
        if len(set(col)) != len(col):
            print 'Duplicates in column: ', col

根據Apero的建議,您也可以使用zip( https://docs.python.org/3/library/functions.html#zip )而不用numpy來執行此操作:

def check_for_duplicates(filename):
    with open(filename, 'r') as fil:
        lines = fil.readlines()
    lines = [line.strip().split() for line in lines]

    for row in lines:
        if len(set(row)) != len(row):
            print 'Duplicates in row: ', row

    for col in zip(*lines):
        if len(set(col)) != len(col):
            print 'Duplicates in column: ', col

在您的示例中,此代碼顯示:

# Duplicates in row:  ['A' 'A' 'B']
# Duplicates in column:  ['A' 'A' 'B']

您可以擁有一個列表列表,並使用zip進行轉置。

以您的示例為例,嘗試:

from collections import Counter

with open(fn) as fin:
    data=[line.split() for line in fin]

rowdups={}  
coldups={}
for d, m in ((rowdups, data), (coldups, zip(*data))):   
    for i, sl in enumerate(m):
        count=Counter(sl)
        for c in count.most_common():
            if c[1]>1:
                d.setdefault(i, []).append(c)

>>> rowdups 
{1: [('A', 2)]}
>>> coldups 
{0: [('A', 2)]} 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM