简体   繁体   中英

Regular expression for a list separated by ',' or 'and'

I have a long list of citations for which I need to extract each author's full name, year published, title, etc. One of the citations looks like this:

Joe Bob, Jane Doe and George H. Smith (2017). A title of an interesting report: Part 2. Report Series no. 101, Place for Generating Reports, Department of Report Makers, City, Province, Country, 44 pages. ISBN: (print) 123-0-1234-1234-5; (online) 123-0-1234-1234-5.

And all of the citations are formatted in the same way. The part I am stuck on right now has to do with extracting the author's full names. I read here about how to extract values from a comma, space, or semi-colon separated list here by doing something like [\\\\s,;]+ . How would I do something similar for a comma or the word 'and'?

I assume that 'and' needs to be treated like a group of characters so I tried [^,|[and])]+ to match the spaces between either , or the character set [and] but this doesn't seem to work. This question is similar in that it deals with a comma or a space, but the solution involves the spaces being stripped implicitly.

After getting this portion down I plan on building the rest of the expression to capture the other citation details. So assume that the string we are dealing with is simply:

Joe Bob, Jane Doe and George H. Smith

and each fullname should be captured.

Here is one possible approach:

citation = """Joe Bob, Jane Doe and George H. Smith (2017). A title of an interesting report: Part 2. Report Series no. 101, Place for Generating Reports, Department of Report Makers, City, Province, Country, 44 pages. ISBN: (print) 123-0-1234-1234-5; (online) 123-0-1234-1234-5."""

citation = citation.replace(' and ', ',')
citation = citation[:citation.find('(')]

names = [name.strip() for name in citation.split(',')]

print names

Giving you:

['Joe Bob', 'Jane Doe', 'George H. Smith']

Convert and into a comma, slice up to where the year starts and split on a comma.

Or in a more compact form:

names = [name.strip() for name in citation[:citation.find('(')].replace(' and ', ',').split(',')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM