一个数字加上两个字符的正则表达式，如“1st”、“2nd”、“10th”、“22nd”？

Question

I have a dataset of phone calls transcribed into text, where each sample contains text.我有一个转录成文本的电话数据集，其中每个样本都包含文本。 I'm trying to identify all the samples where dates are mentioned.我试图找出所有提到日期的样本。 To be clear, I'm only looking for samples where a number and two additional characters are present, like "1st", "2nd", "25th".需要明确的是，我只是在寻找存在数字和两个附加字符的样本，例如“1st”、“2nd”、“25th”。

Right now, I have a rather brute force approach of going about it.现在，我有一个相当蛮力的方法来处理它。 It does the job, but is there a cleaner way to achieve the same using regex?它可以完成这项工作，但是有没有更干净的方法可以使用正则表达式来实现相同的目标？


def date_mentioned(text):
    date_list = ['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th', '11th', '12th', '13th', '14th', '15th', '16th', '17th', '18th', '19th', '20th', '21st', '22nd', '23rd', '24th', '25th', '26th', '27th', '28th', '29th', '30th', '31st']

    for date in date_list:
        if re.search(date, text):
            return True
    return False

Answer 1

You could use a regular expression for this.您可以为此使用正则表达式。 You may try with:您可以尝试：

r'\d{1,2}(?:st|nd|rd|th)'

See demo看演示

Details细节

\\d{1,2}(?:st|nd|rd|th)
- \\d{1,2} matches a digit (equal to [0-9] ) \\d{1,2}匹配一个数字（等于[0-9] ）
  - {1,2} Quantifier — Matches between 1 and 2 times {1,2}量词——匹配 1 到 2 次
- Non-capturing group (?:st|nd|rd|th)非捕获组(?:st|nd|rd|th)
  - 1st Alternative st第一个选择
    - st matches the characters st literally (case sensitive) st 字面上匹配字符 st（区分大小写）
  - 2nd Alternative nd第二种选择
    - nd matches the characters nd literally (case sensitive) nd 逐字匹配字符 nd（区分大小写）
  - 3rd Alternative rd第三个选择
    - rd matches the characters rd literally (case sensitive) rd 字面上匹配字符 rd（区分大小写）
  - 4th Alternative th第 4 个备选方案
    - th matches the characters th literally (case sensitive)tive) th 匹配字符 th 字面上（区分大小写）tive）

Answer 2

For general numbers, \\d*([02-9]1st|2nd|3rd|([04-9]|1[1-3])th) should do what you want.对于一般数字， \\d*([02-9]1st|2nd|3rd|([04-9]|1[1-3])th)应该做你想做的。 You can restrict the numbers further for dates, but full validation is complex (months, leapyears, etc), so I'd recommend just blindly parsing the number and then validating it afterwards.您可以进一步限制日期的数字，但完全验证很复杂（月份、闰年等），所以我建议只是盲目地解析数字，然后再对其进行验证。

Edit: Thanks for pointing out the mistake with 3rd;编辑：感谢您指出第三个错误； fixed.固定的。

Answer 3

You find these dates with:您可以通过以下方式找到这些日期：

[0-9]{1,2}(?:st|nd|rd|th)

Explanation:解释：
1 or 2 digits, 1 或 2 位数字，
followed by st, nd, rd or th后跟 st、nd、rd 或 th

Answer 4

Since you're looking for ordinal numbers, rules are:由于您正在寻找序数，规则是：

If the number ends with 1 and is not 11, add 'st'如果数字以 1 结尾且不是 11，则添加 'st'
If the number ends with 2 and is not 12, add 'nd'如果数字以 2 结尾且不是 12，则添加“nd”
If the number ends with 3 and is not 13, add 'rd'如果数字以 3 结尾而不是 13，则添加 'rd'
for all the other numbers, add 'th'对于所有其他数字，添加“th”

A regex that can distinguish between this cases is:可以区分这种情况的正则表达式是：

'^11th|12th|13th|\d*(1st|2nd|3rd|[04-9]th)$'

And the application is:应用程序是：

def date_mentioned(text):
    if re.match('^11th|12th|13th|\d?(1st|2nd|3rd|[04-9]th)$', text):
        return True
    return False

RegEx explanation正则表达式解释
We're looking for this sequence:我们正在寻找这个序列：

^ : Start of the string ^ : 字符串的开始
11th : string 11th 11th ：字符串11th
| : or ：或者
12th : string 12th 12th ：字符串12th
| : or ：或者
13th : string 13th 13th ：字符串13th
| : or ：或者
\\d? : 0 or 1 digits ( : followed by : 0 或 1 位数字( : 后跟

1st : string 1st 1st ：字符串1st
| : or ：或者
2nd : string 2nd | 2nd ：字符串2nd | : or ：或者
3rd : string 3rd | 3rd ：字符串3rd | : or ：或者
[04-9] : one number between 0 and the comprehensive range 4-9 [04-9] : 0 和综合范围 4-9 之间的一个数
th : string th th ：串th
) : closing 'followed by' ) : 关闭“后继”
$ : end of the string $ : 字符串的结尾

一个数字加上两个字符的正则表达式，如“1st”、“2nd”、“10th”、“22nd”？

问题描述

4 个解决方案

解决方案1
1 已采纳 2019-06-04 15:21:24

解决方案2
1 2019-06-04 15:23:25

解决方案3
1 2019-06-04 15:23:39

解决方案4
1 2019-06-04 15:48:54

一个数字加上两个字符的正则表达式，如“1st”、“2nd”、“10th”、“22nd”？

问题描述

4 个解决方案

解决方案1 1 已采纳 2019-06-04 15:21:24

解决方案2 1 2019-06-04 15:23:25

解决方案3 1 2019-06-04 15:23:39

解决方案4 1 2019-06-04 15:48:54

解决方案1
1 已采纳 2019-06-04 15:21:24

解决方案2
1 2019-06-04 15:23:25

解决方案3
1 2019-06-04 15:23:39

解决方案4
1 2019-06-04 15:48:54