使用正则表达式python提取字符串

Question

I have text in a file that I am pushing into a string.我在一个文件中有文本，我将它推入一个字符串。

txt = "PRIMARY INDEX its_mnth_content_aggr ( AC_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,
DISPATCH_ID ,CASE_CREATE_DT ) 
ABDCGFWERRUU 
asdffggb 
PRIMARY INDEX its_mnth_content_aggr ( AC_CASE ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,\
DISPATCH_ID ,CASE_CREATE_DT )"

I want to extract the complete primary index from it as in primary index (....)我想从中提取完整的主索引，如主索引 (....)

so far i have below到目前为止，我有以下

x3 = re.findall(r"\bPRIMARY\sINDEX\s\w+\W.*", txt)

that gives me这给了我

['PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT )  ABDCGFWERRUU  qwerrtyyuiu PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT )']

I want something like this我想要这样的东西

['PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT ) PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT )']

can someone please help有人可以帮忙吗

Answer 1

You regex says that you want a string that starts by PRIMARY INDEX followed by any characters.您的正则表达式表示您想要一个以PRIMARY INDEX开头的字符串，后跟任何字符。 So it matches all your string;所以它匹配你所有的字符串；

You have to be more specific.你必须更具体。

PRIMARY INDEX[A-Za-z(_,\n\\ ]*\)

the string should start with: PRIMARY INDEX字符串应以： PRIMARY INDEX开头
then there could be any letter or special characters in [A-Za-z(_,\\n\\\\ ] , followed by * because we don't know the number of these characters.那么[A-Za-z(_,\\n\\\\ ]可能有任何字母或特殊字符，后跟*因为我们不知道这些字符的数量。
and it ends by a )并以)结尾

You can try it here你可以在这里试试

Answer 2

You can use您可以使用

re.findall(r'\bPRIMARY\s+INDEX\s+\w+\s*\([^()]*\)', txt)

See the regex demo查看正则表达式演示

Details细节

\\b - word boundary \\b - 词边界
PRIMARY\\s+INDEX - PRIMARY , 1+ whitespaces, INDEX PRIMARY\\s+INDEX - PRIMARY , 1+ 空格, INDEX
\\s+ - 1+ whitespaces \\s+ - 1+ 个空格
\\w+ - 1+ word chars \\w+ - 1+ 个字字符
\\s* - 0+ whitespaces \\s* - 0+ 个空格
\\( - a ( char \\( - a (字符
[^()]* - 0+ chars other than ( and ) [^()]* - 除(和)之外的 0+ 个字符
\\) - a ) char. \\) - a )字符。

使用正则表达式python提取字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-09-12 07:29:56

解决方案2
0 2020-09-12 15:55:25

使用正则表达式python提取字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-09-12 07:29:56

解决方案2 0 2020-09-12 15:55:25

解决方案1
1 已采纳 2020-09-12 07:29:56

解决方案2
0 2020-09-12 15:55:25