简体   繁体   English

使用正则表达式拆分 Python 字符串

[英]Python String split using a regex

We want to split a string multi line for example例如,我们要拆分字符串多行

|---------------------------------------------Title1(a)---------------------------------------------

Content goes here, the quick brown fox jumps over the lazy dog

|---------------------------------------------Title1(b)----------------------------------------------

Content goes here, the quick brown fox jumps over the lazy dog

here's our python split using regex code这是我们使用正则表达式代码拆分的python

import re

str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "" \
    "|---------------------------------------------Title1(b)----------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "|"

print(str1)

str2 = re.split("\|---------------------------------------------", str1)


print(str2)

We want the output to include only我们希望输出只包括

str2[0] : str2[0] :

Content goes here, the quick brown fox jumps over the lazy dog

str2[1] : str2[1] :

Content goes here, the quick brown fox jumps over the lazy dog

what's the proper regex to use, or is there any other way to split using the format above使用什么是正确的正则表达式,或者有没有其他方法可以使用上面的格式进行拆分

Instead of using split, you can match the lines and capture the part that you want in a group.您可以匹配线条并在组中捕获所需的部分,而不是使用拆分。

\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)

Explanation解释

  • \\| Match |匹配|
  • -{2,} Match 2 or more - -{2,}匹配 2 个或更多-
  • [^-]+ Match 1+ times any char except - [^-]+匹配 1+ 次除-之外的任何字符
  • -{2,} Match 2 or more - -{2,}匹配 2 个或更多-
  • ( Capture grou 1 (捕获组 1
    • [^-].*? match any char except - , then any char as least as possible匹配除-之外的任何字符,然后尽可能少地匹配任何字符
  • ) Close group 1 )关闭第 1 组
  • (?=\\|) Positive lookahead, assert a | (?=\\|)正向预测,断言| to the right向右

Regex demo |正则表达式演示| Python demo Python 演示

Example例子

import re
 
regex = r"\|-{2,}[^-]+-{2,}([^-].*?)(?=\|)"
 
str1 = "|---------------------------------------------Title1(a)---------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "" \
    "|---------------------------------------------Title1(b)----------------------------------------------" \
    "" \
    "Content goes here, the quick brown fox jumps over the lazy dog" \
    "|"
 
str2 = re.findall(regex, str1);
print(str2[0])
print(str2[1])

Output输出

Content goes here, the quick brown fox jumps over the lazy dog
Content goes here, the quick brown fox jumps over the lazy dog

If Title should be part of the line, another option is to make the match a bit more precise.如果Title应该是该行的一部分,另一种选择是使匹配更加精确。

\|-+Title\d+\([a-z]\)-+(.+?)(?=\||$)

Regex demo正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM