简体   繁体   中英

Regexp for multiple keywords matching

I have the following case, where I Need to get the username and Password from a string started with username=xxx; and password=yyy;

There is no limitation for username nor password, except that ; should be a delimiter for each keyword and username is always followed by username= , password is always followed by password= .

I tried to construct the following, but I manage to only get partially wanted result

set value "colour=blue;
age=25;
name=anthony;
username=firstuser;
username=hisuser;
password=test123"

set value2 "colour=blue;
age=25;
name=brothersofanthony;
username=seconduser;
password=test123;"

set value3 "username=user-3"

set value4 "username=user4"


regexp -nocase -- {\y(?:username=|password=)[a-z0-9]+} $value match match2
puts "value is $match and match2 is $match2"

regexp -nocase -- {\y(?:username=|password=)[a-z0-9]+} $value2 match match2
puts "value 2 is $match and match2 is $match2"

regexp -nocase -- {\y(?:username=|password=)[a-z0-9]+} $value3 match match2
puts "value 3 is $match and match2 is $match2"

regexp -nocase -- {\y(?:username=|password=)[a-z0-9]+} $value4 match match2
puts "value 4 is $match and match2 is $match2"

I am trying to build a regexp that can return me username and Password. With the above regexp, I manage to only get "username" with the correct result if the username has [a-z0-9] while actually it can be also different symbols (apart from ; as it is delimiter)

If multiple occurrences in string is found (eg for value , there are two username, then the first username shall be taken into account)

The second issue with the above regexp is that it does not show the "Password" value, where it needs to have the same condition as the username.

How I can improve the above regexp?

You need to separate the matches in this particular case, or else you won't be able to distinguish between a username or a password . I would advise using one regexp for the username and another for the password. Next, change the regexp so that the character class is [^;]+ instead of [a-z0-9]+ to match all characters except ; .

set value "colour=blue;
age=25;
name=anthony;
username=firstuser;
username=hisuser;
password=test123"

regexp -nocase -- {\yusername=([^;]+)} $value - username
regexp -nocase -- {\ypassword=([^;]+)} $value - password
puts $username
puts $password
# => firstuser
# => test123

As usual, regular expressions is really far more work than necessary.

proc getUsernameAndPassword record {
    set res [dict create]
    foreach {keyword value} [split [string map [list \n {}] $record] \;=] {
        if {$keyword in {username password} && $keyword ni [dict keys $res]} {
            dict set res $keyword $value
        }
    }
    if {[dict size $res]} {
        return $res
    } else {
        return None
    }
}

This command will return either the string None if no user name or password can be found in the record. If either value is found in the record, the command will return a list containing the relevant keyword ( username or password ) followed by the value. If both values are found, the list will contain both keywords, each followed by the value.

The command transforms your record to a key-val list by removing all newline characters and then splitting the string at each semicolon or equal sign. Each key-val pair is checked to see if they key is either username or password and if the keyword has not been added to res yet. If both conditions are true, the keyword and value are stored in res . If, at the end of the command, anything has been stored in res , the dictionary is returned: otherwise None is returned.

Documentation: dict , foreach , if , list , proc , return , set , split , string

I think the easiest way to do this is to

set RE {^(username|password)=(.+?)(?:;|$)}
foreach {matched field contents} [regexp -all -inline -line $RE $value] {
    puts "I found '$field' which held '$contents'"
}

On your first sample, this produces:

I found 'username' which held 'firstuser'
I found 'username' which held 'hisuser'
I found 'password' which held 'test123'

We're using -all to match every possible place, not just the first of them, -inline to get the matches returned (so we can foreach over them), and -line to make the RE engine not match things over lines (affects . , ^ and $ ).

You'll have to decide what to do when a field is present twice, but that's no longer matching so much as parsing into a higher-level concept.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM