[英]python regular expression to match strings
I want to parse a string, such as: 我想解析一个字符串,例如:
package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'
uses-permission:'android.permission.WRITE_APN_SETTINGS'
uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'
uses-permission:'android.permission.ACCESS_NETWORK_STATE'
I want to get: 我想得到:
string1: jp.tjkapp.droidllwp`
string2: 1.1
Because there are multiple uses-permission, I want to get permission as a list, contains: WRITE_APN_SETTINGS
, RECEIVE_BOOT_COMPLETED
and ACCESS_NETWORK_STATE
. 由于有多种用途的许可,我要得到许可作为一个列表,包含: WRITE_APN_SETTINGS
, RECEIVE_BOOT_COMPLETED
和ACCESS_NETWORK_STATE
。
Could you help me write the python regular expression to get the strings I want? 您能帮我写python正则表达式来获取我想要的字符串吗? Thanks. 谢谢。
Assuming the code block you provided is one long string, here stored in a variable called input_string
: 假设您提供的代码块是一个长字符串,这里存储在一个名为input_string
的变量中:
name = re.search(r"(?<=name\=\')[\w\.]+?(?=\')", input_string).group(0)
versionName = re.search(r"(?<=versionName\=\')\d+?\.\d+?(?=\')", input_string).group(0)
permissions = re.findall(r'(?<=android\.permission\.)[A-Z_]+(?=\')', input_string)
Explanation: 说明:
(?<=name\\=\\')
: check ahead of the main string in order to return only strings that are preceded by name='
. (?<=name\\=\\')
:在主字符串之前检查,以仅返回以name='
开头的字符串。 The \\
in front of =
and '
serve to escape them so that the regex knows we're talking about the =
string and not a regex command. =
和'
前面的\\
用来使它们转义,以便正则表达式知道我们在谈论=
字符串,而不是正则表达式命令。 name='
is not also returned when we get the result, we just know that the results we get are all preceded by it. 当我们得到结果时,也不会返回name='
,我们只知道我们得到的结果都以它开头。 [\\w\\.]+?
: This is the main string we're searching for. :这是我们要搜索的主要字符串。 \\w
means any alphanumeric character and underscore. \\w
表示任何字母数字字符和下划线。 \\.
is an escaped period, so the regex knows we mean .
是一个逃脱的时期,所以正则表达式知道我们的意思.
and not the regex command represented by an unescaped period. 而不是用不转义的句号表示的regex命令。 Putting these in []
means we're okay with anything we've stuck in brackets, so we're saying that we'll accept any alphanumeric character, _
, or .
将它们放在[]
意味着我们对放在方括号中的任何内容都可以接受,所以我们说我们将接受任何字母数字字符_
或.
. 。 +
afterwords means at least one of the previous thing , meaning at least one (but possibly more) of [\\w\\.]
. +
后缀表示至少一个上一个事物 ,表示[\\w\\.]
至少一个(但可能更多)。 Finally, the ?
最后, ?
means don't be greedy --we're telling the regex to get the smallest possible group that meets these specifications, since +
could go on for an unlimited number of repeats of anything matched by [\\w\\.]
. 意味着不要贪心-我们告诉正则表达式获取满足这些规范的最小可能组,因为+
可以无限次重复[\\w\\.]
匹配的任何内容。 (?=\\')
: check behind the main string in order to return only strings that are followed by '
. (?=\\')
:在主字符串后面检查,以便仅返回后跟'
字符串。 The \\
is also an escape, since otherwise regex or Python's string execution might misinterpret '
. \\
也是一个转义,因为否则正则表达式或Python的字符串执行可能会误解'
。 This final '
is not returned with our results, we just know that in the original string, it followed any result we do end up getting. 这个final '
不随我们的结果一起返回,我们只知道在原始字符串中,它跟在我们最终得到的任何结果之后。 You can do this without regex by reading the file content line by line. 您可以在不使用正则表达式的情况下,通过逐行读取文件内容来执行此操作。
>>> def split_string(s):
... if s.startswith('package'):
... return [i.split('=')[1] for i in s.split() if "=" in i]
... elif s.startswith('uses-permission'):
... return s.split('.')[-1]
...
>>> split_string("package: name='jp.tjkapp.droid1lwp' versionCode='2' versionName='1.1'")
["'jp.tjkapp.droid1lwp'", "'2'", "'1.1'"]
>>> split_string("uses-permission:'android.permission.WRITE_APN_SETTINGS'")
"WRITE_APN_SETTINGS'"
>>> split_string("uses-permission:'android.permission.RECEIVE_BOOT_COMPLETED'")
"RECEIVE_BOOT_COMPLETED'"
>>> split_string("uses-permission:'android.permission.ACCESS_NETWORK_STATE'")
"ACCESS_NETWORK_STATE'"
>>>
Here is one example code 这是一个示例代码
#!/usr/bin/env python
inputFile = open("test.txt", "r").readlines()
for line in inputFile:
if line.startswith("package"):
words = line.split()
string1 = words[1].split("=")[1].replace("'","")
string2 = words[3].split("=")[1].replace("'","")
test.txt file contains input data you mentioned earlier.. test.txt文件包含您之前提到的输入数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.