[英]Complex group regex pattern in Java
我已经开发了正则表达式模式来解析科学文章中的书目。 我们使用AMA引用样式,对于期刊引用,可以如下所示:
"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24(7): 3057-3067."
或没有发行编号:
"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24: 3057-3067."
或只有首页(电子号码)。
"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24(7): 3057."
或仅使用卷号(如果提前打印):
"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24."
我的模式匹配所有这种情况,并对所有数据进行分组(由于Java,用2个斜杠转义):
(.*?)\\.(.*?)\\.(.*?)(?<year>\\d+)\\s*?;?\\s*?(?:(?<volume>\\d+))?(?:\\((?<issue>\\d+)\\))?\\s*?(?::\\s*?(?<fpage>\\d+|[A-Za-z]+\\d+))?(?:[\\-\\–](?<lpage>\\d+))?\\.
问题在于作者始终在第一页和最后一页页码之间放置空格。 我认为也许也可以更改此模式以匹配它?
"Nielsen MK, Neergaard MA, Jensen AB, Bro F, Guldin MB. Psychological distress, health, and socio-economic factors in caregivers of terminally ill patients: a nationwide population-based cohort study. Support Care Cancer. 2016; 24(7): 3057 - 3067."
这是一个示例 ,可以看出模式与之不正确匹配。
正确的正则表达式是
(.*?)\.(.*?)\.(.*?)(?<year>\d+)\s*?;?\s*?(?:(?<volume>\d+))?(?:\((?<issue>\d+)\))?\s*?(?::\s*?(?<fpage>\d+|[A-Za-z]+\d+))?(?:[ ]*[\-|\–][ ]*(?<lpage>\d+))?\.
这个https://regex101.com/r/RAdNgb/2解决了您的问题。 请检查一下。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.