简体   繁体   English

python正则表达式匹配字符串

[英]python regular expression to match strings

I want to parse a string, such as: 我想解析一个字符串,例如:

package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'

I want to get: 我想得到:

string1: jp.tjkapp.droidllwp`

string2: 1.1

Because there are multiple uses-permission, I want to get permission as a list, contains: WRITE_APN_SETTINGS , RECEIVE_BOOT_COMPLETED and ACCESS_NETWORK_STATE . 由于有多种用途的许可,我要得到许可作为一个列表,包含: WRITE_APN_SETTINGSRECEIVE_BOOT_COMPLETEDACCESS_NETWORK_STATE

Could you help me write the python regular expression to get the strings I want? 您能帮我写python正则表达式来获取我想要的字符串吗? Thanks. 谢谢。

Assuming the code block you provided is one long string, here stored in a variable called input_string : 假设您提供的代码块是一个长字符串,这里存储在一个名为input_string的变量中:

name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)

Explanation: 说明:

name 名称

  • (?<=name\\=\\') : check ahead of the main string in order to return only strings that are preceded by name=' . (?<=name\\=\\') :在主字符串之前检查,以仅返回以name='开头的字符串。 The \\ in front of = and ' serve to escape them so that the regex knows we're talking about the = string and not a regex command. ='前面的\\用来使它们转义,以便正则表达式知道我们在谈论=字符串,而不是正则表达式命令。 name=' is not also returned when we get the result, we just know that the results we get are all preceded by it. 当我们得到结果时,也不会返回name=' ,我们只知道我们得到的结果都以它开头。
  • [\\w\\.]+? : This is the main string we're searching for. :这是我们要搜索的主要字符串。 \\w means any alphanumeric character and underscore. \\w表示任何字母数字字符和下划线。 \\. is an escaped period, so the regex knows we mean . 是一个逃脱的时期,所以正则表达式知道我们的意思. and not the regex command represented by an unescaped period. 而不是用不转义的句号表示的regex命令。 Putting these in [] means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _ , or . 将它们放在[]意味着我们对放在方括号中的任何内容都可以接受,所以我们说我们将接受任何字母数字字符_. . + afterwords means at least one of the previous thing , meaning at least one (but possibly more) of [\\w\\.] . +后缀表示至少一个上一个事物 ,表示[\\w\\.]至少一个(但可能更多)。 Finally, the ? 最后, ? means don't be greedy --we're telling the regex to get the smallest possible group that meets these specifications, since + could go on for an unlimited number of repeats of anything matched by [\\w\\.] . 意味着不要贪心-我们告诉正则表达式获取满足这些规范的最小可能组,因为+可以无限次重复[\\w\\.]匹配的任何内容。
  • (?=\\') : check behind the main string in order to return only strings that are followed by ' . (?=\\') :在主字符串后面检查,以便仅返回后跟'字符串。 The \\ is also an escape, since otherwise regex or Python's string execution might misinterpret ' . \\也是一个转义,因为否则正则表达式或Python的字符串执行可能会误解' This final ' is not returned with our results, we just know that in the original string, it followed any result we do end up getting. 这个final ' 随我们的结果一起返回,我们只知道在原始字符串中,它跟在我们最终得到的任何结果之后。

You can do this without regex by reading the file content line by line. 您可以在不使用正则表达式的情况下,通过逐行读取文件内容来执行此操作。

>>> def split_string(s):
...     if s.startswith('package'):
...             return [i.split('=')[1] for i in s.split() if "=" in i]
...     elif s.startswith('uses-permission'):
...             return s.split('.')[-1]
... 
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>> 

Here is one example code 这是一个示例代码

#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
    if line.startswith("package"):
        words = line.split()
        string1 = words[1].split("=")[1].replace("'","")
        string2 = words[3].split("=")[1].replace("'","")

test.txt file contains input data you mentioned earlier.. test.txt文件包含您之前提到的输入数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM