[英]Finding duplicates in each row and column
該功能需要能夠檢查文件中每一行和每一列是否重復。
重復文件示例:
A B C
A A B
B C A
如您所見,第2行中有2個A,但第1列中有兩個A。 碼:
def duplication_char(dc):
with open (dc,"r") as duplicatechars:
linecheck = duplicatechar.readlines()
linecheck = [line.split() for line in linecheck]
for row in linecheck:
if len(set(row)) != len(row):
print ("duplicates", " ".join(row))
for column in zip(*checkLine):
if len(set(column)) != len(column):
print ("duplicates"," ".join(column))
好吧,這就是我要怎么做。
首先,讀取文件並創建一個包含內容的2d numpy數組:
import numpy
with open('test.txt', 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
然后,使用集合檢查每一行是否有重復項(一個集合沒有重復項,因此,如果集合的長度與數組的長度不同,則該數組有重復項):
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
然后,通過轉置numpy數組,檢查每個列是否具有使用集合的重復項:
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
如果將所有這些都包裝在一個函數中:
def check_for_duplicates(filename):
import numpy
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
根據Apero的建議,您也可以使用zip( https://docs.python.org/3/library/functions.html#zip )而不用numpy來執行此操作:
def check_for_duplicates(filename):
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
for row in lines:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in zip(*lines):
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
在您的示例中,此代碼顯示:
# Duplicates in row: ['A' 'A' 'B']
# Duplicates in column: ['A' 'A' 'B']
您可以擁有一個列表列表,並使用zip
進行轉置。
以您的示例為例,嘗試:
from collections import Counter
with open(fn) as fin:
data=[line.split() for line in fin]
rowdups={}
coldups={}
for d, m in ((rowdups, data), (coldups, zip(*data))):
for i, sl in enumerate(m):
count=Counter(sl)
for c in count.most_common():
if c[1]>1:
d.setdefault(i, []).append(c)
>>> rowdups
{1: [('A', 2)]}
>>> coldups
{0: [('A', 2)]}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.