简体   繁体   English

Python正则表达式将两行合并为一

[英]Python regex combine two lines into one

I'm scraping info from a webpage and I'm trying to combine two lines of output into one line. 我正在从网页上抓取信息,并且试图将两行输出合并为一行。 I've been trying to do this through regex patterns, though I'm not sure if it's possible to do it that way, or if there's a better way. 我一直在尝试通过正则表达式模式执行此操作,尽管我不确定是否可以这样做,或者是否有更好的方法。 The original output is: 原始输出为:

Season Dates: Nov 21
2014 to Apr 19

along with other lines above and below, which I would like to keep as separate lines. 以及上方和下方的其他几行,我希望将其保留为单独的行。

I would like to return for these two lines: 我想返回以下两行:

Season Dates: Nov 21 2014 to Apr 19

I've tried: 我试过了:

result2 = re.sub("(Season\sDates:\s[JFMAJASOND][aepuoc][nbrpylgcv]\s[0-9]?[0-9])", '\12[0-9][0-9][0-9]\sto\s[JFMAJASOND][aepuoc][nbrpylgcv]\s[0-9]?[0-9]', result)

The output I get from this is: 我从中得到的输出是:

[0-9][0-9][0-9]\sto\s[JFMAJASOND][aepuoc][nbrpylgcv]\s[0-9]?[0-9]

I've tried multiple other variations including assigning the regex to variables, etc, but can't get anything to work. 我尝试了其他多种变体,包括将正则表达式分配给变量等,但是什么都做不到。

From what I can find online, I'm not sure that the replacement value can be a regex pattern, but I'm still unclear on that. 根据我在网上可以找到的信息,我不确定替换值是否可以是正则表达式模式,但我仍不清楚。 Is this possible through regex, or is there a better way to do it? 通过正则表达式可以做到这一点,还是有更好的方法呢?

Try this: 尝试这个:

r=re.compile('(Season\sDates):\s(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dev)\s(\d+)\s*$\s*(\d+)\s+to\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dev)\s+(\d+)',re.MULTILINE)
p="""Season Dates: Nov 21
2014 to Apr 19"""
r.sub('\g<1>: \g<2> \g<3> \g<4> to \g<5> \g<6>',p)

You can capture the : and the to or combine some of the groups together if you want. 您可以捕获:to或将某些组组合在一起。 Let me know if you need more or something different. 让我知道您是否需要更多或其他不同的东西。

re.sub(r"\n"," ",test_str)

If its such a simple use case you can simply do this.See demo. 如果它是一个简单的用例,则可以简单地执行此操作。

https://regex101.com/r/fX3oF6/1 https://regex101.com/r/fX3oF6/1

EDIT: 编辑:

if more than 2 lines are there use 如果有两行以上

 (\bSeason\s+Dates:\s*\S+\s+\d+)\n(\d+\s+to\s+\S+\s+\d+)

Replace by \\1 \\2 .See demo. 替换为\\1 \\2参见演示。

https://regex101.com/r/fX3oF6/7 https://regex101.com/r/fX3oF6/7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM