[英]In Dataframe, remove parentheses and dash from phone number and also take care about international prefix
In data frame, how to remove unnecessary thing from Contact number在数据框中,如何从联系电话中删除不必要的东西
df df
Id Phone
1 (+1)123-456-7890
2 (123)-(456)-(7890)
3 123-456-7890
Final Output最终 Output
Id Phone
1 1234567890
2 1234567890
3 1234567890
I would use a regex with str.replace
here:我会在这里使用带有
str.replace
的正则表达式:
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
output: output:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
regex:正则表达式:
^(?:\(\+\d+\)) # match a (+0) leading identifier
| # OR
\D # match a non-digit
This might be important to keep.这可能很重要。
Keep the prefixes:保留前缀:
df['Phone2'] = df['Phone'].str.replace(r'[^+\d]', '', regex=True)
output: output:
Id Phone Phone2
0 1 (+1)123-456-7890 +11234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890
Only drop a specific prefix (here +1
):仅删除特定前缀(此处为
+1
):
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+1\))|[^+\d]', '', regex=True)
# or, more flexible
df['Phone2'] = df['Phone'].str.replace(r'(?:\+1\D)|[^+\d]', '', regex=True)
output: output:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.