简体   繁体   中英

Splitting a string to find words between delimiters?

Given a certain line that looks like this:

jfdajfjlausername=Bob&djfkaak;jdskjpsasword=12345&

I want to return the username and password, in this case being Bob and 12345 .

I tried splitting the string by the & sign but could not figure out how to then find the individual words, and then also tried the below code:

left='password='
right='&'
userleft='username='
for x in file.readlines():
    if 'password=' and 'username=' in x:
        text=str(x)
        #password=(text[text.index(left)+len(left):text.index(right)])
        #username=(text[text.index(userleft)+len(userleft):text.index(useright)])
        

Without using regular expressions, you can split twice: once on & and once on = :

line = 'jfdajfjlausername=Bob&djfkaak;jdskjpsasword=12345&'
items = [item.split('=') for item in line.split('&')]

Now you can extract the values:

for item in items:
    if len(item) == 2:
        if item[0].endswith('password'):
            password = item[1]
        elif item[0].endswith('username'):
            username = item[1]

If you had a bunch of keys you were looking for, like ('username', 'password') , you could write a nested loop to build dictionaries:

keys = ('username', 'password')
result = {}
for item in items:
    if len(item) == 2:
        for k in keys:
            if item[0].endswith(k):
                result[k] = item[1]
                break

This makes it a lot easier to check that you got all the values you want, eg with if len(keys) == len(result): ... .

If you want a very simple approach, you could do this:

data = 'jfdajfjlausername=Bob&djfkaak;jdskjpassword=12345&'

#right of "username=" and left of "&"
un = data.split('username=')[1].split('&')[0]

#right of "password=" and left of "&"
pw = data.split('password=')[1].split('&')[0]

print(un, pw) #Bob, 12345

Since the process is identical except for the desired key, you could do something like the below and homogenize the process of getting the value for any key in the query. An interesting side-effect of this is: Even if your example query did not end in "&", this would still work. This is because everything that is left would be in the result of .split('&')[0] , and there simply wouldn't be a .split('&')[1] . Nothing below uses .split('&')[1] so, it just wouldn't matter.

query = 'jfdajfjlausername=Bob&djfkaak;jdskjpassword=12345&'

key2val = lambda q,k: q.split(f'{k}=')[1].split('&')[0]

un = key2val(query, 'username')
pw = key2val(query, 'password')

print(un, pw) #Bob, 12345

This method is likely superior to regex. It is bound to be faster, it doesn't require any dependencies or loops, and it is flexible enough to allow you to get the value from any key, regardless of order, without the need to ever change anything.

Use Regex:

import re
for x in file.readlines():
    if 'password=' in x and 'username=' in x:
        text=str(x)
        username = re.findall('username=(\w+)',text)
        password = re.findall('password=(\w+)',text)

Note the updated if statement. In the orginal, the if checks if "password=" evaluates to True , which it always will - since it is not an empty string.

Update #2

This reads a file named "text" and parses out the username and password for each line if they both exist.

This solution assumes that the username and password fields both end with a "&".

Update #3:

Note that this code will work even if the order of the username and password is reversed.

import re
with open('text') as f:
    for line in f:
        print(line.strip())
        # Note that ([^&]+) captures any characters up to the next &.
        m1 = re.search('username=([^&]+)', line)
        m2 = re.search('password=([^&]+)', line)
        if m1 and m2:
            print('username=', m1[1])
            print('password=', m2[1])

Output:

jfdajfjlausername=Bob&djfkaak;jdskjpassword=12345&
username= Bob
password= 12345

You can use a single regular expression to parse this information out:

import re

s = "jfdajfjlausername=Bob&djfkaak;jdskjpassword=12345&"
regex = "username=(?P<username>.+)&.*password=(?P<password>.+)&"
match = re.search(regex, s)

print(match.groupdict())
{'username': 'Bob', 'password': '12345'}

Implementing this while looping over the lines in a file would look like:

regex = "username=(?P<username>.+)&.*password=(?P<password>.+)&"

with open('text') as f:
    for line in f:
  
        match = re.search(regex, line)
        if match is not None:
            print(match.groupdict())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM