简体   繁体   English

修改alter number字符串熊猫

[英]modification of alter number string pandas

Background 背景

I have the following sample df which is an alternation of Alter number string in pandas column 我有以下示例dfpandas列Alter数字字符串的替代

import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Smith  Record #:  0000004 is this ', 
                                   'Record #:  0000003 Mary Lisa Hider found here', 
                                   'Jane A Doe is also here Record #:  0000002',
                                'Record #:  0000001'], 

                      'P_ID': [1,2,3,4],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df

                                    Text             N_ID   P_ID
0   Jon J Smith Record #: 0000004 is this       A1  1
1   Record #: 0000003 Mary Lisa Hider fou...    A2  2
2   Jane A Doe is also here Record #: 000...    A3  3
3   Record #: 0000001                           A4  4

Goal 目标

1) replace number after Record #: with **BLOCK** 1)将Record #:之后的数字替换为**BLOCK**

Jon J Smith Record #: 0000004 is this
Jon J Smith Record #: **BLOCK** is this

2) create new column 2)创建新列

Desired Output 期望的输出

    Text    N_ID    P_ID    New_Text              
0                          Jon J Smith Record #: **BLOCK** is this      
1                          Record #: **BLOCK**  Mary Lisa Hider fou...  
2                          Jane A Doe is also here Record #: **BLOCK**  
3                          Record #: **BLOCK**                          

Tried 试过了

I have tried the following but this is not quite right 我已经尝试了以下方法,但这不是很正确

df['New_Text']= df['Text'].replace(r'(?i)record\s+#: \d+', r"Date of Birth: **BLOCK**", regex=True)

Question

How do I alter my code to get my desired output? 如何更改代码以获得所需的输出?

You are matching a single space after the : which you could turn into \\s+ (or repeat a space + if it can only be spaces) and use a capturing group for the first part. 您在:后面匹配一个空格,您可以将其变成\\s+ (或者,如果只能是空格,则重复空格+ ),并在第一部分使用捕获组。

(?i)(medical\s+record\s+#:\s+)\d+

Regex demo 正则表达式演示

In the replacement use 在替换使用中

\1**BLOCK**

The final piece of code will look like this 最后的代码如下所示

df['New_Text']= df['Text'].replace(r'(?i)(medical\s+record\s+#:\s+)\d+', r"\1**BLOCK**", regex=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM