简体   繁体   中英

Extracting string before the quotations

I am trying to parse a pdf in python and extract string in quotations. I am able to extract the text in quotations but I also want to extract the name before the quotation starts. For example: Consider this

Ziblatt, Daniel. 2004. "Rethinking the Origins of Federalism: Puzzle, Theory, and Evidence from Nineteenth-Century Europe,"

I am able to extract everything quotations but I want the name to be extracted as well . This is the code I am using.. Please help

def quotes(x):
    quoted = re.compile('"[^"]*"')
    for value in quoted.findall(x):
        print value 

Capturing data before a double-quote should work:

def quotes(x):
    quoted = re.compile('(.+)"[^"]+"')
    for value in quoted.findall(x):
        print value.strip()

I get this ouput:

>>> quotes(text)
'Ziblatt, Daniel. 2004.'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM