简体   繁体   English

Python中的正则表达式和替换

[英]Regular expression and substitution in Python

I have a content string like:我有一个内容字符串,如:

content =
"""
the patient monitoring system shall perform a daily device check from 1:30 am to 4:30 am (patient local time). if a device malfunction is detected, the daily device check shall send the malfunction to the clinician. if a patient health alarm is detected, the daily device check shall turn into full interrogation as specified in srs-3003. if no device or patient health issue identified, the daily device check shall end without further notification to the clinicians or patient. if a scheduled interrogation happens on the same day, the daily device check shall be skipped. if any device issue detected during the daily device check, the patient monitoring system shall alarm the patient with red urgent light. . if any patient health issue detected during the daily device check, the patient monitoring system shall alarm the patient with yellow warning light. . if a daily device check fails, it should be retried in 15 minutes up to 3 times. if a daily device check still fails after 3 times, the patient monitoring system shall end the interrogation and notify patient of the failed device check at 8 am that morning. there are 3 types of interrogations as below:
1. scheduled interrogation.
2. daily device check
3. patient initiated interrogation. an interrogation could fail due to the following reasons:
1. failed to establish communication.
2. communication lost.
3. failed to obtain a key data from the implanted device.
"""

And I want to replace the subhead like 1. 2. 3. and so on, but don't want to affect the actual content numbers like srs-3003.我想替换像 1. 2. 3. 这样的小标题,但不想像 srs-3003 这样影响实际的内容编号。

If I use the following regular expression: re.findall("\d{1}\.", content) result are ['3.', '1.', '2.', '3.', '1.', '2.', '3.'] and '3.'如果我使用以下正则表达式: re.findall("\d{1}\.", content)结果是['3.', '1.', '2.', '3.', '1.', '2.', '3.']和 '3.' in srs-300 3. will be replaced in content in the next step:在 srs-300 3.将在下一步中替换内容:

num_dot = re.findall("\d+\.", content)
for num in num_dot:
    content = content.replace(num, "")

How can I proceed?我该如何进行?

Your regex is up to the mark.您的正则表达式符合要求。 Just in order to not match 3. in srs-3003.只是为了不匹配srs-3003.中的3. .。 you may add ^ anchor.您可以添加^锚。 Something like:就像是:

^\d+\.

Explanation of the above regex:上述正则表达式的解释:

  • ^ - Represents start of the line. ^ - 代表行的开始。
  • \d+ - Represents digit class apeearing one or more times. \d+ - 代表数字 class 出现一次或多次。
  • \. - Matches . - 比赛. literally.字面上地。 If you want to remove space also which is in front of every numbered point line;如果您还想删除每个编号点线前面的空间; please use + or \s+ .请使用+\s+

You can find the demo of the above regex in here.您可以在此处找到上述正则表达式的演示。


Sample Implementation in Python: Python 中的示例实现:

import re

regex = r"^\d+\."

test_str = ("the patient monitoring system shall perform a daily device check from 1:30 am to 4:30 am (patient local time). if a device malfunction is detected, the daily device check shall send the malfunction to the clinician. if a patient health alarm is detected, the daily device check shall turn into full interrogation as specified in srs-3003. if no device or patient health issue identified, the daily device check shall end without further notification to the clinicians or patient. if a scheduled interrogation happens on the same day, the daily device check shall be skipped. if any device issue detected during the daily device check, the patient monitoring system shall alarm the patient with red urgent light. . if any patient health issue detected during the daily device check, the patient monitoring system shall alarm the patient with yellow warning light. . if a daily device check fails, it should be retried in 15 minutes up to 3 times. if a daily device check still fails after 3 times, the patient monitoring system shall end the interrogation and notify patient of the failed device check at 8 am that morning. there are 3 types of interrogations as below:\n"
    "1. scheduled interrogation.\n"
    "2. daily device check\n"
    "3. patient initiated interrogation. an interrogation could fail due to the following reasons:\n"
    "1. failed to establish communication.\n"
    "2. communication lost.\n"
    "3. failed to obtain a key data from the implanted device.")

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

Please find the sample run of the above program in here.请在此处找到上述程序的示例运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM