[英]How to remove strings before a numeric value in a pandas dataframe column?
I have a pandas dataframe column with strings that looks like this: 我有一个带有如下字符串的pandas dataframe列:
Column A
text moretext 251 St. Louis Apt.54
123 Orange Drive
sometext somemoretext 171 Poplar street
textnew 11th street
77 yorkshire avenue
I want to remove the text before the numeric values ie I want the output to be something like this: 我想删除数值之前的文本,即我希望输出是这样的:
Column A
251 St. Louis Apt.54
123 Orange Drive
171 Poplar street
11th street
77 yorkshire avenue
Let's use regex and extract
: 让我们使用正则表达式和extract
:
df['Column A'] = df['Column A'].str.extract(r'(\d+.+$)')
Output: 输出:
0 251 St. Louis Apt.54
1 123 Orange Drive
2 171 Poplar street
3 11th street
4 77 yorkshire avenue
Name: Column A, dtype: object
The regex states get a group of characters start with a number of any length and continue until the end of the line. 正则表达式状态使一组字符以任意长度的数字开头,并一直持续到行尾。
This function is finding the index of the first numerical character in the string and selecting the remaining part of the string. 此功能是查找字符串中第一个数字字符的索引并选择字符串的其余部分。 This function is then applied to each value of the column using apply function 然后使用apply函数将此函数应用于列的每个值
def change(string):
for i, c in enumerate(string):
if c.isdigit():
idx = i
break
return string[idx:]
data[A] = data[A].apply(change, axis = 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.