Python Regex to find a string in double quotes within a string

I'm looking for a code in python using regex that can perform something like this

Input: Regex should return "String 1" or "String 2" or "String3"

Output: String 1,String2,String3

I tried r'"*"'

Here's all you need to do:

def doit(text):      
  import re
  matches = re.findall(r'"(.+?)"',text)
  # matches is now ['String 1', 'String 2', 'String3']
  return ",".join(matches)

doit('Regex should return "String 1" or "String 2" or "String3" ')


'String 1,String 2,String3'

As pointed out by Li-aung Yip :

To elaborate, .+? is the "non-greedy" version of .+ . It makes the regular expression match the smallest number of characters it can instead of the most characters it can. The greedy version, .+ , will give String 1" or "String 2" or "String 3 ; the non-greedy version .+? gives String 1 , String 2 , String 3 .

In addition, if you want to accept empty strings, change .+ to .* . Star * means zero or more while plus + means at least one.

The highly up-voted answer doesn't account for the possibility that the double-quoted string might contain one or more double-quote characters (properly escaped, of course). To handle this situation, the regex needs to accumulate characters one-by-one with a positive lookahead assertion stating that the current character is not a double-quote character that is not preceded by a backslash (which requires a negative lookbehind assertion ):


See Regex Demo

import re
import ast

def doit(text):
    for match in matches:
        print(match, '=>', ast.literal_eval(match))

doit('Regex should return "String 1" or "String 2" or "String3" and "\\"double quoted string\\"" ')


"String 1" => String 1
"String 2" => String 2
"String3" => String3
"\"double quoted string\"" => "double quoted string"

Just try to fetch double quoted strings from the multiline string:

import re

s = """ 
"my name is daniel"  "mobile 8531111453733"[[[[[[--"i like pandas"
"location chennai"! -asfas"aadhaar du2mmy8969769##69869" 
@4343453 "pincode 642002""@mango,@apple,@berry" 
print(re.findall(r'"(.*?)"', s))

From https://stackoverflow.com/a/69891301/1531728

My solution is:

import re
my_strings = ['SetVariables "a" "b" "c" ', 'd2efw   f "first" +&%#$%"second",vwrfhir, d2e   u"third" dwedew', '"uno"?>P>MNUIHUH~!@#$%^&*()_+=0trewq"due"        "tre"fef    fre f', '       "uno""dos"      "tres"', '"unu""doua""trei"', '      "um"                    "dois"           "tres"                  ']
my_substrings = []
for current_test_string in my_strings:
    for values in re.findall(r'\"(.+?)\"', current_test_string):
        #print("values are:",values,"=")
    print(" my_substrings are:",my_substrings,"=")
    my_substrings = []

Alternate regular expressions to use are:

  • re.findall('"(.+?)"', current_test_string) [Avinash2021] [user17405772021]
  • re.findall('"(.*?)"', current_test_string) [Shelvington2020]
  • re.findall(r'"(.*?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"(.+?)"', current_test_string) [Lundberg2012] [Avinash2021]
  • re.findall(r'"["]', current_test_string) [Muthupandi2019]
  • re.findall(r'"([^"]*)"', current_test_string) [Pieters2014]
  • re.findall(r'"(?:(?:(?!(?<!\)").)*)"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Booboo2020]
  • re.findall(r'"(.*?)(?<!\)"', current_test_string) [Hassan2014]
  • re.findall('"[^"]*"', current_test_string) # Causes double quotes to remain in the strings, but can be removed via other means. [Martelli2013]
  • re.findall('"([^"]*)"', current_test_string) [jspcal2014]
  • re.findall("'(.*?)'", current_test_string) [akhilmd2016]

The current_test_string.split("\"") approach works if the strings have patterns in which substrings are embedded within quotation marks. This is because it uses the double quotation mark in this example as a delimiter to tokenize the string, and accepts substrings that are not embedded within double quotation marks as valid substring extractions from the string.


For me the only regex that ever worked right for all the cases of quoted strings with possibly escaped quotes inside of them was:


This will not fail even if the quoted string ends with an escaped backslash.

import re


for text in texts:
     print (text,"-->",re.fullmatch(r,text))


"aerrrt" --> <_sre.SRE_Match object; span=(0, 8), match='"aerrrt"'>
"a\"e'rrt" --> <_sre.SRE_Match object; span=(0, 10), match='"a\\"e\'rrt"'>
"a""""arrtt""""" --> None
"aerrrt --> None
"a\"errt' --> None
'aerrrt' --> <_sre.SRE_Match object; span=(0, 8), match="'aerrrt'">
'a\'e"rrt' --> <_sre.SRE_Match object; span=(0, 10), match='\'a\\\'e"rrt\''>
'a''''arrtt''''' --> None
'aerrrt --> None
'a\'errt" --> None
'' --> <_sre.SRE_Match object; span=(0, 2), match="''">
"" --> <_sre.SRE_Match object; span=(0, 2), match='""'>
 --> None

