[英]extract strings using regex python
I have text in a file that I am pushing into a string.我在一个文件中有文本,我将它推入一个字符串。
txt = "PRIMARY INDEX its_mnth_content_aggr ( AC_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,
DISPATCH_ID ,CASE_CREATE_DT )
ABDCGFWERRUU
asdffggb
PRIMARY INDEX its_mnth_content_aggr ( AC_CASE ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,\
DISPATCH_ID ,CASE_CREATE_DT )"
I want to extract the complete primary index from it as in primary index (....)我想从中提取完整的主索引,如主索引 (....)
so far i have below到目前为止,我有以下
x3 = re.findall(r"\bPRIMARY\sINDEX\s\w+\W.*", txt)
that gives me这给了我
['PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT ) ABDCGFWERRUU qwerrtyyuiu PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT )']
I want something like this我想要这样的东西
['PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT ) PRIMARY INDEX its_mnth_content_aggr ( AC_CASE_ID ,ROW_ADDED_DT ,NOTE_SEQ_NR ,BIZ_UNIT_CD ,DISPATCH_ID ,CASE_CREATE_DT )']
can someone please help有人可以帮忙吗
You regex says that you want a string that starts by PRIMARY INDEX
followed by any characters.您的正则表达式表示您想要一个以PRIMARY INDEX
开头的字符串,后跟任何字符。 So it matches all your string;所以它匹配你所有的字符串;
You have to be more specific.你必须更具体。
PRIMARY INDEX[A-Za-z(_,\n\\ ]*\)
PRIMARY INDEX
字符串应以: PRIMARY INDEX
开头[A-Za-z(_,\\n\\\\ ]
, followed by *
because we don't know the number of these characters.那么[A-Za-z(_,\\n\\\\ ]
可能有任何字母或特殊字符,后跟*
因为我们不知道这些字符的数量。)
并以)
结尾You can use您可以使用
re.findall(r'\bPRIMARY\s+INDEX\s+\w+\s*\([^()]*\)', txt)
See the regex demo查看正则表达式演示
Details细节
\\b
- word boundary \\b
- 词边界PRIMARY\\s+INDEX
- PRIMARY
, 1+ whitespaces, INDEX
PRIMARY\\s+INDEX
- PRIMARY
, 1+ 空格, INDEX
\\s+
- 1+ whitespaces \\s+
- 1+ 个空格\\w+
- 1+ word chars \\w+
- 1+ 个字字符\\s*
- 0+ whitespaces \\s*
- 0+ 个空格\\(
- a (
char \\(
- a (
字符[^()]*
- 0+ chars other than (
and )
[^()]*
- 除(
和)
之外的 0+ 个字符\\)
- a )
char. \\)
- a )
字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.