如何在 Python 中对多个术语使用正面和负面展望？

Question

我有一个如下所示的数据框

df = pd.DataFrame({'person_id': [11,11,11,11,11,11,11,11,11,11],
                   'text':['inJECTable 1234 Eprex DOSE 4000 units on NONd',
                           'department 6789 DOSE 8000 units on DIALYSIS days  -  IV Interm',
                           'inJECTable 4321 Eprex DOSE - 3 times/wk on NONdialysis day',
                           'insulin MixTARD  30/70 - inJECTable 46 units',
                           'insulin ISOPHANE -- InsulaTARD  Vial -  inJECTable 56 units  SC SubCutaneous',
                           '1-alfacalcidol DOSE 1 mcg  - 3 times a week  -  IV Intermittent',
                           'jevity liquid - FEEDS PO  Jevity  -  237 mL  -  1 times per day',
                           '1-alfacalcidol DOSE 1 mcg  - 3 times per week  -  IV Intermittent',
                           '1-supported DOSE 1 mcg  - 1 time/day  -  IV Intermittent',
                           '1-testpackage DOSE 1 mcg  - 1 time a day  -  IV Intermittent']})

我想删除遵循模式的单词/字符串，例如46 units 、 3 times a week 3 times per week 、 1 time/day 3 times per week 。

我正在阅读关于正面和负面的前后展望。

所以，正在尝试像下面这样的东西

[^([0-9\s]*(?=units))]  #to remove terms like `46 units` from the string
[^[0-9\s]*(?=times)(times a day)] # don't know how to make this work for all time variants

时间变量例如： 3 times a day 、 3 time/wk 、 3 times per day 3 times a month 3 times/month 。

基本上，我希望我的输出类似于以下内容（删除诸如 xx 个单位、每天 xx 次、每周 xx 次、xx 时间/天、xx 时间/周、xx 时间/周、每周 xx 次等术语）

Answer 1

你可以考虑一个模式

\s*\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))

查看正则表达式演示

注意： \\d+匹配一位或多位数字。 如果您需要匹配任何数字，请考虑以您期望的格式为数字使用其他模式，请参阅正则表达式以查找小数/浮点数？ ，例如。

图案详情

\\s* - 零个或多个空白字符
\\d+ - 一位或多位数字
\\s* - 零个或多个空格
(?:units?|times?(?:\\s+(?:a|per)\\s+|\\s*/\\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?)) - 非捕获组匹配：
- units? - unit或多个units
- | - 或者
- times? - time或times
- (?:\\s+(?:a|per)\\s+|\\s*/\\s*) - a或per用 1+ 个空格包围，或/用 0+ 个空格包围
- (?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?) - d或day ，或wk或week ，或month ，或y / yea / yr

如果您只需要匹配整个单词，请使用单词边界\\b ：

\s*\b\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))\b

在熊猫中，使用

df['text'] = df['text'].str.replace(r'\s*\b\d+\s*(?:units?|times?(?:\s+(?:a|per)\s+|\s*/\s*)(?:d(?:ay)?|w(?:ee)?k|month|y(?:ea)?r?))\b', '')

如何在 Python 中对多个术语使用正面和负面展望？

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-10-01 14:14:23

如何在 Python 中对多个术语使用正面和负面展望？

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-10-01 14:14:23

解决方案1
3 已采纳 2020-10-01 14:14:23