I have a long list of citations for which I need to extract each author's full name, year published, title, etc. One of the citations looks like this:
Joe Bob, Jane Doe and George H. Smith (2017). A title of an interesting report: Part 2. Report Series no. 101, Place for Generating Reports, Department of Report Makers, City, Province, Country, 44 pages. ISBN: (print) 123-0-1234-1234-5; (online) 123-0-1234-1234-5.
And all of the citations are formatted in the same way. The part I am stuck on right now has to do with extracting the author's full names. I read here about how to extract values from a comma, space, or semi-colon separated list here by doing something like [\\\\s,;]+
. How would I do something similar for a comma or the word 'and'?
I assume that 'and' needs to be treated like a group of characters so I tried [^,|[and])]+
to match the spaces between either ,
or the character set [and]
but this doesn't seem to work. This question is similar in that it deals with a comma or a space, but the solution involves the spaces being stripped implicitly.
After getting this portion down I plan on building the rest of the expression to capture the other citation details. So assume that the string we are dealing with is simply:
Joe Bob, Jane Doe and George H. Smith
and each fullname should be captured.
Here is one possible approach:
citation = """Joe Bob, Jane Doe and George H. Smith (2017). A title of an interesting report: Part 2. Report Series no. 101, Place for Generating Reports, Department of Report Makers, City, Province, Country, 44 pages. ISBN: (print) 123-0-1234-1234-5; (online) 123-0-1234-1234-5."""
citation = citation.replace(' and ', ',')
citation = citation[:citation.find('(')]
names = [name.strip() for name in citation.split(',')]
print names
Giving you:
['Joe Bob', 'Jane Doe', 'George H. Smith']
Convert and
into a comma, slice up to where the year starts and split on a comma.
Or in a more compact form:
names = [name.strip() for name in citation[:citation.find('(')].replace(' and ', ',').split(',')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.