简体   繁体   中英

python regular expression to match strings

I want to parse a string, such as:

package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'

I want to get:

string1: jp.tjkapp.droidllwp`

string2: 1.1

Because there are multiple uses-permission, I want to get permission as a list, contains: WRITE_APN_SETTINGS , RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE .

Could you help me write the python regular expression to get the strings I want? Thanks.

Assuming the code block you provided is one long string, here stored in a variable called input_string :

name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)

Explanation:

name

  • (?<=name\\=\\') : check ahead of the main string in order to return only strings that are preceded by name=' . The \\ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. name=' is not also returned when we get the result, we just know that the results we get are all preceded by it.
  • [\\w\\.]+? : This is the main string we're searching for. \\w means any alphanumeric character and underscore. \\. is an escaped period, so the regex knows we mean . and not the regex command represented by an unescaped period. Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _ , or . . + afterwords means at least one of the previous thing , meaning at least one (but possibly more) of [\\w\\.] . Finally, the ? means don't be greedy --we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\\w\\.] .
  • (?=\\') : check behind the main string in order to return only strings that are followed by ' . The \\ is also an escape, since otherwise regex or Python's string execution might misinterpret ' . This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting.

You can do this without regex by reading the file content line by line.

>>> def split_string(s):
...     if s.startswith('package'):
...             return [i.split('=')[1] for i in s.split() if "=" in i]
...     elif s.startswith('uses-permission'):
...             return s.split('.')[-1]
... 
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>> 

Here is one example code

#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
    if line.startswith("package"):
        words = line.split()
        string1 = words[1].split("=")[1].replace("'","")
        string2 = words[3].split("=")[1].replace("'","")

test.txt file contains input data you mentioned earlier..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM