简体   繁体   中英

Python | regex | String Validation

import re
u,d,c=0,0,0
n=int(input())
for i in range(0,n):
    uid=str(input())
    uid = "".join(sorted(uid))
    if (len(uid)<=10):   
        for i in uid:
            if(re.search("[a-z]", uid)):
                flag=-1
            if(re.search("[0-9]", uid)):
                flag=-1
            if(re.search("[A-Z]", uid)):
                flag=-1
            if(uid.count(i)>1):
                c+=1
            if(i.isupper()):  #uppercase
                u+=1
            if(i.isdigit()):   
                    d+=1    
    if(u>=2 and d>=3 and flag==-1 and c==0):
        print("Valid")
    else:
        print("Invalid")

The above code is for validating uid(string) .
When I pass 2 values and when the 1st value is invalid then it is correctly validated and prints "invalid" then for next value even if it is valid, it still prints "invalid" . Whereas, if 1st value is valid it prints "valid" and if the next value is invalid it prints "invalid" .
Added image for reference(in image 1st value is invalid because of repeated characters but 2nd value is valid, still showing invalid)

在此处输入图像描述

Rules for Validating uid:

It must contain at least 3 digits (0-9).
It must contain at least 2 uppercase English alphabet characters.
It should only contain alphanumeric characters (a-z, A-Z & 0-9 ).
No character should repeat. 
There must be exactly 10 characters in a valid UID.

You perform the validation by attempting to match the string with the following regular expression.

^(?=(?:.*\d){3})(?=(?:.*[A-Z]){2})(?!.*(.).*\1)[A-Za-z0-9]{10}$

Start your engine!

Python's regex engine performs the following operations.

^               : assert beginning of string
(?=             : positive lookahead to assert string contains at
                  least three digits
  (?:.*\d)      : match 0+ chars, then 1 digit in a non-capture group
  {3}           : execute non-capture group thrice
)               : end positive-lookahead
(?=             : positive lookahead to assert string contains at
                  least two capital letters
  (?:.*[A-Z])   : match 0+ chars, then 1 uppercase letter in a
                  non-capture group
  {2}           : execute non-capture group twice 
)               : end positive-lookahead
(?!             : negative lookahead to assert string does contain 
                  the same character twice
  .*(.).*\1     : match 0+ chars, 1 character saved to capture group
                  1, 0+ chars, contents of capture group 1
)               : end negative lookahead
[A-Za-z0-9]{10} : match 10 letters or digits
$               : assert end of string

Notice that a positive lookahead is used to assert that the string contains "something" (here, at least 3 digits and at least 2 capital letters). A negative lookahead is used to assert that the string does not contain "something" (here, a character that is repeated). [A-Za-z0-9]{10} , together with the ^ and $ anchors, asserts both the permitted characters in the string and the length of the string.

A more efficient variant of this regex was suggested in a comment by @Thefourthbird:

^(?=(?:[^\d\s]*\d){3})(?=(?:[^A-Z\s]*[A-Z]){2})(?!.*?(.).*\1)[A-Za-z0-9]{10}$

As seen , that regex requires 596 steps for the test string. That compares with 906 required by the regex I proposed.

You have kept your variables u , d , c as global. So that's why in the next iteration of your loops, their values are already modified and hence you are facing issues with correct output from the loop.

Just keep the variables inside the loop and your output will be correct.


Correct Code -

import re

n = int(input())
for i in range(0, n):
    u, d, c, flag = 0, 0, 0, 0  # either move the variables inside the loop or reset them everytime
    uid = str(input())
    uid = ''.join(sorted(uid))
    if len(uid) <= 10:
        for i in uid:
            if re.search('[a-z]', uid):
                flag = -1
            if re.search('[0-9]', uid):
                flag = -1
            if re.search('[A-Z]', uid):
                flag = -1
            if uid.count(i) > 1:
                c += 1
            if i.isupper():  # uppercase
                u += 1
            if i.isdigit():
                d += 1
    if u >= 2 and d >= 3 and flag == -1 and c == 0:
        print ('Valid')
    else:
        print ('Invalid')

OUTPUT:

2
B1CD102354
Invalid
B1CD602354
Valid

It looks like you are never resetting the variables between each iteration of the loop.

import re

n=int(input())
for i in range(0,n):
    u,d,c,flag=0,0,0 # move this inside the loop
    uid=str(input())
    uid = "".join(sorted(uid))
    if (len(uid)<=10):   
        for i in uid:
            if(re.search("[a-z]", uid)):
                flag=-1
            if(re.search("[0-9]", uid)):
                flag=-1
            if(re.search("[A-Z]", uid)):
                flag=-1
            if(uid.count(i)>1):
                c+=1
            if(i.isupper()):  #uppercase
                u+=1
            if(i.isdigit()):   
                    d+=1    
    if(u>=2 and d>=3 and flag==-1 and c==0):
        print("Valid")
    else:
        print("Invalid")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM