简体   繁体   English

一个数字加上两个字符的正则表达式,如“1st”、“2nd”、“10th”、“22nd”?

[英]Regex for a digit plus two characters like '1st', '2nd', '10th', '22nd'?

I have a dataset of phone calls transcribed into text, where each sample contains text.我有一个转录成文本的电话数据集,其中每个样本都包含文本。 I'm trying to identify all the samples where dates are mentioned.我试图找出所有提到日期的样本。 To be clear, I'm only looking for samples where a number and two additional characters are present, like "1st", "2nd", "25th".需要明确的是,我只是在寻找存在数字和两个附加字符的样本,例如“1st”、“2nd”、“25th”。

Right now, I have a rather brute force approach of going about it.现在,我有一个相当蛮力的方法来处理它。 It does the job, but is there a cleaner way to achieve the same using regex?它可以完成这项工作,但是有没有更干净的方法可以使用正则表达式来实现相同的目标?


def date_mentioned(text):
    date_list = ['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th', '29th', '30th', '31st']

    for date in date_list:
        if re.search(date, text):
            return True
    return False

You could use a regular expression for this.您可以为此使用正则表达式。 You may try with:您可以尝试:

r'\d{1,2}(?:st|nd|rd|th)'

See demo演示


Details细节

  • \\d{1,2}(?:st|nd|rd|th)
    • \\d{1,2} matches a digit (equal to [0-9] ) \\d{1,2}匹配一个数字(等于[0-9]
      • {1,2} Quantifier — Matches between 1 and 2 times {1,2}量词——匹配 1 到 2 次
    • Non-capturing group (?:st|nd|rd|th)非捕获组(?:st|nd|rd|th)
      • 1st Alternative st第一个选择
        • st matches the characters st literally (case sensitive) st 字面上匹配字符 st(区分大小写)
      • 2nd Alternative nd第二种选择
        • nd matches the characters nd literally (case sensitive) nd 逐字匹配字符 nd(区分大小写)
      • 3rd Alternative rd第三个选择
        • rd matches the characters rd literally (case sensitive) rd 字面上匹配字符 rd(区分大小写)
      • 4th Alternative th第 4 个备选方案
        • th matches the characters th literally (case sensitive)tive) th 匹配字符 th 字面上(区分大小写)tive)

For general numbers, \\d*([02-9]1st|2nd|3rd|([04-9]|1[1-3])th) should do what you want.对于一般数字, \\d*([02-9]1st|2nd|3rd|([04-9]|1[1-3])th)应该做你想做的。 You can restrict the numbers further for dates, but full validation is complex (months, leapyears, etc), so I'd recommend just blindly parsing the number and then validating it afterwards.您可以进一步限制日期的数字,但完全验证很复杂(月份、闰年等),所以我建议只是盲目地解析数字,然后再对其进行验证。

Edit: Thanks for pointing out the mistake with 3rd;编辑:感谢您指出第三个错误; fixed.固定的。

You find these dates with:您可以通过以下方式找到这些日期:

[0-9]{1,2}(?:st|nd|rd|th)

Explanation:解释:
1 or 2 digits, 1 或 2 位数字,
followed by st, nd, rd or th后跟 st、nd、rd 或 th

Since you're looking for ordinal numbers, rules are:由于您正在寻找序数,规则是:

If the number ends with 1 and is not 11, add 'st'如果数字以 1 结尾且不是 11,则添加 'st'
If the number ends with 2 and is not 12, add 'nd'如果数字以 2 结尾且不是 12,则添加“nd”
If the number ends with 3 and is not 13, add 'rd'如果数字以 3 结尾而不是 13,则添加 'rd'
for all the other numbers, add 'th'对于所有其他数字,添加“th”

A regex that can distinguish between this cases is:可以区分这种情况的正则表达式是:

'^11th|12th|13th|\d*(1st|2nd|3rd|[04-9]th)$'

And the application is:应用程序是:

def date_mentioned(text):
    if re.match('^11th|12th|13th|\d?(1st|2nd|3rd|[04-9]th)$', text):
        return True
    return False

RegEx explanation正则表达式解释
We're looking for this sequence:我们正在寻找这个序列:

^ : Start of the string ^ : 字符串的开始
11th : string 11th 11th :字符串11th
| : or : 或者
12th : string 12th 12th :字符串12th
| : or : 或者
13th : string 13th 13th :字符串13th
| : or : 或者
\\d? : 0 or 1 digits ( : followed by : 0 或 1 位数字( : 后跟

1st : string 1st 1st :字符串1st
| : or : 或者
2nd : string 2nd | 2nd :字符串2nd | : or : 或者
3rd : string 3rd | 3rd :字符串3rd | : or : 或者
[04-9] : one number between 0 and the comprehensive range 4-9 [04-9] : 0 和综合范围 4-9 之间的一个数
th : string th th :串th
) : closing 'followed by' ) : 关闭“后继”
$ : end of the string $ : 字符串的结尾

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python处理日期格式,例如“ 1、2、3、4” - Python deal with date format like “1st, 2nd, 3rd, 4th” 正则表达式不匹配任何组,第一组或第二组,但不能同时匹配。 像“NAND”这样的东西 - Regex match no groups, 1st group OR 2nd group but not both. Something like 'NAND' 从单个 Pandas 列中取出第一和第二、第四和第五等行并放入两个新列 Python - Taking the 1st and 2nd, 4th and 5th etc rows from a single Pandas column and put in two new columns, Python 根据第一个值和第二个键组合两个字典 - Combine two dictionaries based on value of 1st and key of 2nd Python:如何遍历两个列表中的每个值,计算在第一个列表中值&lt;22或在第二个列表中&lt;27的出现次数? - Python: How to iterate through each value in two lists, count the occurrences that value is < 22 in the 1st list OR < 27 in the 2nd list? 我有一个数字列表,其中第一个数字被附加到一个新列表,然后是最后一个,然后是第二个,然后是第二个数字,依此类推 - I have a list of numbers where the 1st digit is appended to a new list, then last, then 2nd, then 2nd last and so on for循环跳过第一和第二元素 - for loop skipping 1st and 2nd element 根据特殊规范,对XML进行1st | 2nd | 3rd | 4th语句 - Make sentences with 1st|2nd|3rd|4th to XML according to a special specification 如何打印第 1 名、第 2 名等直至第 5 名? - How do I print 1st place, 2nd place, etc. up to 5th place? 如果第一列匹配,则提取具有第 11 列值的行位于第二个文件的第 2 和第 3 之间 - Extract rows having the 11th column values lies between 2nd and 3nd of a second file if 1st column matches
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM