简体   繁体   中英

Splitting a string in 2 strings using regular expression

Good morning, I have a question, using Webscraping, I extract an information in string format like this:

"Issued May2018No expiration date"

what I want is to split this string into 2 strings by using regular expression, my idea is: whenever you find 4 digits followed by "No", I want to create the following string:

"Issued May2018 - No expiration date".

In this way, I'm able to use the method "split" applied to "-" in a way that I achieve two strings:

  • Issued May2018
  • No expiration date

I was thinking using regex with

\d\d\d\dNo

and it should be able to recognise 2018No, but I don't know how to proceed in order that I can replace it with

May2018 - No expiration date 

and set the floor for using the split function

Any suggestions? other approaches are well suggested

You can use a capture group to capture 4 digits followed by matching No

In the replacement use the capture group 1 value followed by - No

import re

s = "Issued May2018No expiration date"
pattern = r"(\d{4})No "
print(re.sub(pattern, r"\1 - No ", s))

Output

Issued May2018 - No expiration date

See a Python demo and a regex demo .

Use re.sub .

\g<1> is represented in the string passed to the repl parameter of re.sub() as the result of a match for reference group 1.

import re

s = "Issued May2018No expiration date"
print(re.sub("(\d{4})(No)", "\g<1> - \g<2>", s))

# 'Issued May2018 - No expiration date'
import re

string = "Issued May2018No expiration date"

m = re.findall(r"^(.*[0-9]{4})(No.*)$", string)

print(m[0][0] + " - " + m[0][1])

->

Issued May2018 - No expiration date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM