[英]modification of alter number string pandas
Background 背景
I have the following sample df
which is an alternation of Alter number string in pandas column 我有以下示例
df
是pandas列中Alter数字字符串的替代
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Smith Record #: 0000004 is this ',
'Record #: 0000003 Mary Lisa Hider found here',
'Jane A Doe is also here Record #: 0000002',
'Record #: 0000001'],
'P_ID': [1,2,3,4],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df
Text N_ID P_ID
0 Jon J Smith Record #: 0000004 is this A1 1
1 Record #: 0000003 Mary Lisa Hider fou... A2 2
2 Jane A Doe is also here Record #: 000... A3 3
3 Record #: 0000001 A4 4
Goal 目标
1) replace number after Record #:
with **BLOCK**
1)将
Record #:
之后的数字替换为**BLOCK**
Jon J Smith Record #: 0000004 is this
Jon J Smith Record #: **BLOCK** is this
2) create new column 2)创建新列
Desired Output 期望的输出
Text N_ID P_ID New_Text
0 Jon J Smith Record #: **BLOCK** is this
1 Record #: **BLOCK** Mary Lisa Hider fou...
2 Jane A Doe is also here Record #: **BLOCK**
3 Record #: **BLOCK**
Tried 试过了
I have tried the following but this is not quite right 我已经尝试了以下方法,但这不是很正确
df['New_Text']= df['Text'].replace(r'(?i)record\s+#: \d+', r"Date of Birth: **BLOCK**", regex=True)
Question 题
How do I alter my code to get my desired output? 如何更改代码以获得所需的输出?
You are matching a single space after the :
which you could turn into \\s+
(or repeat a space +
if it can only be spaces) and use a capturing group for the first part. 您在
:
后面匹配一个空格,您可以将其变成\\s+
(或者,如果只能是空格,则重复空格+
),并在第一部分使用捕获组。
(?i)(medical\s+record\s+#:\s+)\d+
In the replacement use 在替换使用中
\1**BLOCK**
The final piece of code will look like this 最后的代码如下所示
df['New_Text']= df['Text'].replace(r'(?i)(medical\s+record\s+#:\s+)\d+', r"\1**BLOCK**", regex=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.