[英]How do I remove numbers from a df column containing numbers and text (3 or ABC, but not mixtures, ABC123), leaving blank cells?
I have a dataframe where the first column, lets call it: df['Name'], looks like the "actual" column, and Id like to change it to look the "desired" column in order to do operations on following columns.我有一个 dataframe 第一列,我们称之为:df['Name'],看起来像“实际”列,我想将其更改为“所需”列,以便对以下列进行操作。 Here are the actual and desired outputs:
以下是实际和期望的输出:
Name (actual)![]() |
Name (desired)![]() |
---|---|
string1![]() |
string1![]() |
Number![]() |
string1![]() |
Number![]() |
string1![]() |
Number![]() |
string1![]() |
string2![]() |
string2![]() |
Number![]() |
string2![]() |
Number![]() |
string2![]() |
Number![]() |
string2![]() |
Number![]() |
string2![]() |
string3![]() |
string3![]() |
Number![]() |
string3![]() |
Number![]() |
string3![]() |
string4![]() |
string4![]() |
Number![]() |
string4![]() |
etc ![]() |
etc ![]() |
There is no fixed number of 'numbers', between the names.名称之间没有固定数量的“数字”。 Could be 3, could be 300.
可能是3,可能是300。
I have the following code to forward fill the names as far as the next name:我有以下代码可以将名称转发到下一个名称:
df['Name'].fillna(method = 'ffill', inplace = True)
but it only works when the cells with numbers are empty.但它仅在带有数字的单元格为空时才有效。
So, I need to remove all the numbers from the ['Name'] series first, leaving empty cells:所以,我需要先从 ['Name'] 系列中删除所有数字,留下空单元格:
Name![]() |
---|
String1![]() |
blank![]() |
blank![]() |
blank![]() |
String2![]() |
blank![]() |
etc... ![]() |
I cant find a way to remove the numbers.我找不到删除数字的方法。 Ive tried some suggestions I found in other similar posts:
我尝试了一些我在其他类似帖子中找到的建议:
1) 1)
df[df['Name'].apply(lambda x: isinstance(x, str))]
but it seems to do nothing.但它似乎什么也没做。
2) 2)
df['Name'] = df['Name'].apply(lambda x: isinstance(x, str))
turns the whole ['Name'] series to True, both strings and numbers.将整个 ['Name'] 系列变为 True,包括字符串和数字。
3) 3)
df['Name'] = df[df['Name'].apply(lambda x: isinstance(x, str))]
which gives a value error.这给出了一个值错误。
I found the result to 2) strange, but discovered df['Name'].dtype gave me dtype('O'), which Id never seen before, but suggests the names (strings) and numbers (integers/floats) in the ['Name'] series are the same type (numpy objects).我发现 2) 的结果很奇怪,但发现 df['Name'].dtype 给了我 dtype('O'),这是我以前从未见过的,但在['Name'] 系列是同一类型(numpy 对象)。 Not sure if/how its relevant, but I understood it to mean that Python sees both the text and numbers as being the same type.
不确定它是否/如何相关,但我理解它的意思是 Python 将文本和数字视为同一类型。
Im stuck.我卡住了。 Any suggestions on how to remove the numbers and fill the way I explained?
关于如何删除数字并填写我解释的方式的任何建议?
Thanks!谢谢!
Using apply
is not efficient, prefer a vectorial method:使用
apply
效率不高,更喜欢矢量方法:
# identify numbers:
m = pd.to_numeric(df['Name'], errors='coerce').notna()
# mask and ffill:
df['Name'] = df['Name'].mask(m).ffill()
Example (assigning to new column "Name 2" for clarity);示例(为清楚起见,分配给新列“名称 2”);
Name Name2
0 string1 string1
1 123 string1
2 123 string1
3 123 string1
4 string2 string2
5 123 string2
6 123 string2
7 123 string2
8 123 string2
9 string3 string3
10 123 string3
11 123 string3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.