简体   繁体   中英

Check for duplicates in list of strings

I want to check if some strings in a column, with random size are duplicated. If it is, python should print out which line the error is observed.

My code is as follows:

import numpy as np
data = np.array([["s154090","Lis",1,0],["s151515","Lars",2,3],["s151515","Preben",1,0],["s154080","Rene",5,7]])

def sortGrades(data):

    studentId = data[:,0]
    xs = studentId
    s = set()
    if any(i in s or s.add(i) for i in xs):
        s = set()
        duplicates = set(i for i in xs if i in s or s.add(i))
        print("Error in line {},".format(i),"Det følgende Studie ID går igen",duplicates)
    else:
        print("Ingen Fejl")
        return ""

But it doesn't work, since i isn't defined.

---> 11 print("Error in line {},".format(i),"Det følgende Studie ID går igen",duplicates)

NameError: name 'i' is not defined

I am using python 3.5.

Apart from the exception your approach is a bit too complicated. For example you only need one pass over the data:

def sortGrades(data):
    studentId = data[:,0]
    xs = studentId
    s = set()
    for line, val in enumerate(xs):
        if val in s:  # if the current value was already seen print the error message
            print("Error in line {},".format(line),"Det følgende Studie ID går igen", val)
        # Add the value
        s.add(val)

>>> sortGrades(data)
Error in line 2, Det følgende Studie ID går igen s151515

Your exception occurs when you try to access loop variables of comprehension outside their scope (which is impossible, at least in python-3.x). So as soon as your comprehension finished you can't access the i anymore.

That's why I used an explicit for -loop. That way you can access the loop variables.


If you want to be really lazy you could also use a function from an external module that I authored: iteration_utilities.duplicates

from iteration_utilities import duplicates
from operator import itemgetter

for line, val in duplicates(enumerate(data[:,0]), key=itemgetter(1)):
    print("Error in line {},".format(line),"Det følgende Studie ID går igen", val)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM