简体   繁体   中英

Extract Sub string from String using Regex

i have a requirement, i need to extract substring from String using regex.

for example, here is my sample data:

Hello, "How" are "you" What "are" you "doing?"

from this example data, i need to extract only second and fourth occurrence of double quoted data.

my requirement is : you doing?

i tried with below regex but i am unable to extract as per my requirement.

"(.*?)"

We can use re.findall and then slice the result to get the first and third matches:

import re

string = 'Hello, "How" are "you" What "are" you "doing?"'
result = re.findall('".+?"', string)[1::2]

print(result)

Here, the regex matches any number of characters contained within double quote marks, but tries to match as few as possible (a non-greedy match), otherwise we would end up with one single match, "How" are "you" What "are" you "doing?" .

Output:

['"you"', '"doing?"']

If you want to combine them without the quote marks, you can use str.strip along with str.join :

print(' '.join(string.strip('"') for string in result))

Output:

you doing?

An alternative method would be to just split on " :

result = string.split('"')[1::2][1::2]
print(result)

Output:

['you', 'doing?']

This works because, if you separate the string by double quote marks, then the output will be as follows:

  1. Everything before the first double quote
  2. Everything after the first double quote and before the second
  3. Everything after the second double quote and before the third ...

This means that we can take every even element to get the ones that are in quotes. We can then just slice the result again to get the 2nd and 4th results.

Regex only solution. May not be 100% accurate since it matches every second occurrence rather than just the 2nd and 4th, but it works for the example.

"[^"]+"[^"]+("[^"]+")

Demonstration in JS:

 var str = 'Hello, "How" are "you" What "are" you "doing?"'; var regex = /"[^"]+"[^"]+("[^"]+")/g match = regex.exec(str); while (match != null) { // matched text: match[0] // match start: match.index // capturing group n: match[n] console.log(match[1]) match = regex.exec(str); } 

We can try using re.findall to extract all quoted terms. Then, build a string using only even entries in the resulting list:

input = "Hello, \"How\" are \"you\" What \"are\" you \"doing?\""
matches = re.findall(r'\"([^"]+)\"', input)
matches = matches[1::2]
output = " ".join(matches)
print(output)

you doing?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM