I am trying to parse a pdf in python and extract string in quotations. I am able to extract the text in quotations but I also want to extract the name before the quotation starts. For example: Consider this
Ziblatt, Daniel. 2004. "Rethinking the Origins of Federalism: Puzzle, Theory, and Evidence from Nineteenth-Century Europe,"
I am able to extract everything quotations but I want the name to be extracted as well . This is the code I am using.. Please help
def quotes(x):
quoted = re.compile('"[^"]*"')
for value in quoted.findall(x):
print value
Capturing data before a double-quote should work:
def quotes(x):
quoted = re.compile('(.+)"[^"]+"')
for value in quoted.findall(x):
print value.strip()
I get this ouput:
>>> quotes(text)
'Ziblatt, Daniel. 2004.'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.