Replace N digit numbers in a sentence with specific strings for different values of N

Question

I have a bunch of strings in a pandas dataframe that contain numbers in them. I could the riun the below code and replace them all

df.feature_col = df.feature_col.str.replace('\d+', ' NUM ')

But what I need to do is replace any 10 digit number with a string like masked_id , any 16 digit numbers with account_number , or any three-digit numbers with yet another string, and so on.

How do I go about doing this?

PS: since my data size is less, a less optimal way is also good enough for me.

Answer 1

You could do a series of replacements, one for each length of number:

df.feature_col = df.feature_col.str.replace(r'\b\d{3}\b', ' 3mask ')
df.feature_col = df.feature_col.str.replace(r'\b\d{10}\b', masked_id)
df.feature_col = df.feature_col.str.replace(r'\b\d{16}\b', account_number)

Answer 2

Another way is replace with option regex=True with a dictionary. You can also use somewhat more relaxed match patterns (in order) than Tim's:

# test data
df = pd.DataFrame({'feature_col':['this has 1234567', 
                                  'this has 1234', 
                                  'this has 123',
                                  'this has none']})

# pattern in decreasing length order
# these of course would replace '12345' with 'ID45' :-)
df['feature_col'] = df.feature_col.replace({'\d{7}': 'ID7',
                                            '\d{4}': 'ID4',   
                                            '\d{3}': 'ID3'}, 
                                           regex=True)

Output:

     feature_col
0   this has ID7
1   this has ID4
2   this has ID3
3  this has none

Replace N digit numbers in a sentence with specific strings for different values of N

Question

2 answers

solution1
2 2021-03-11 05:59:53

solution2
2 ACCPTED 2021-03-11 06:12:27

Replace N digit numbers in a sentence with specific strings for different values of N

Question

2 answers

solution1 2 2021-03-11 05:59:53

solution2 2 ACCPTED 2021-03-11 06:12:27

solution1
2 2021-03-11 05:59:53

solution2
2 ACCPTED 2021-03-11 06:12:27