简体   繁体   English

如何从包含数字和文本(3 或 ABC,但不是混合物,ABC123)的 df 列中删除数字,留下空白单元格?

[英]How do I remove numbers from a df column containing numbers and text (3 or ABC, but not mixtures, ABC123), leaving blank cells?

I have a dataframe where the first column, lets call it: df['Name'], looks like the "actual" column, and Id like to change it to look the "desired" column in order to do operations on following columns.我有一个 dataframe 第一列,我们称之为:df['Name'],看起来像“实际”列,我想将其更改为“所需”列,以便对以下列进行操作。 Here are the actual and desired outputs:以下是实际和期望的输出:

Name (actual)姓名(实际) Name (desired)姓名(所需)
string1字符串1 string1字符串1
Number数字 string1字符串1
Number数字 string1字符串1
Number数字 string1字符串1
string2字符串2 string2字符串2
Number数字 string2字符串2
Number数字 string2字符串2
Number数字 string2字符串2
Number数字 string2字符串2
string3字符串3 string3字符串3
Number数字 string3字符串3
Number数字 string3字符串3
string4字符串4 string4字符串4
Number数字 string4字符串4
etc ETC etc ETC

There is no fixed number of 'numbers', between the names.名称之间没有固定数量的“数字”。 Could be 3, could be 300.可能是3,可能是300。

I have the following code to forward fill the names as far as the next name:我有以下代码可以将名称转发到下一个名称:

df['Name'].fillna(method = 'ffill', inplace = True)

but it only works when the cells with numbers are empty.但它仅在带有数字的单元格为空时才有效。

So, I need to remove all the numbers from the ['Name'] series first, leaving empty cells:所以,我需要先从 ['Name'] 系列中删除所有数字,留下空单元格:

Name姓名
String1字符串1
blank空白的
blank空白的
blank空白的
String2字符串2
blank空白的
etc... ETC...

I cant find a way to remove the numbers.我找不到删除数字的方法。 Ive tried some suggestions I found in other similar posts:我尝试了一些我在其他类似帖子中找到的建议:

1) 1)

df[df['Name'].apply(lambda x: isinstance(x, str))]

but it seems to do nothing.但它似乎什么也没做。

2) 2)

df['Name'] = df['Name'].apply(lambda x: isinstance(x, str))

turns the whole ['Name'] series to True, both strings and numbers.将整个 ['Name'] 系列变为 True,包括字符串和数字。

3) 3)

df['Name'] = df[df['Name'].apply(lambda x: isinstance(x, str))]

which gives a value error.这给出了一个值错误。

I found the result to 2) strange, but discovered df['Name'].dtype gave me dtype('O'), which Id never seen before, but suggests the names (strings) and numbers (integers/floats) in the ['Name'] series are the same type (numpy objects).我发现 2) 的结果很奇怪,但发现 df['Name'].dtype 给了我 dtype('O'),这是我以前从未见过的,但在['Name'] 系列是同一类型(numpy 对象)。 Not sure if/how its relevant, but I understood it to mean that Python sees both the text and numbers as being the same type.不确定它是否/如何相关,但我理解它的意思是 Python 将文本和数字视为同一类型。

Im stuck.我卡住了。 Any suggestions on how to remove the numbers and fill the way I explained?关于如何删除数字并填写我解释的方式的任何建议?

Thanks!谢谢!

Using apply is not efficient, prefer a vectorial method:使用apply效率不高,更喜欢矢量方法:

# identify numbers:
m = pd.to_numeric(df['Name'], errors='coerce').notna()

# mask and ffill:
df['Name'] = df['Name'].mask(m).ffill()

Example (assigning to new column "Name 2" for clarity);示例(为清楚起见,分配给新列“名称 2”);

       Name    Name2
0   string1  string1
1       123  string1
2       123  string1
3       123  string1
4   string2  string2
5       123  string2
6       123  string2
7       123  string2
8       123  string2
9   string3  string3
10      123  string3
11      123  string3

You're close.你很近。 Try this:尝试这个:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name (actual)': ['string1', 334, 34, 124, 'string2', 23, 11, 89, 76, 'string3', 53, 4]})

df['Name (desired)'] = df['Name (actual)'].apply(lambda x: x if isinstance(x, str) else np.nan).ffill()

>>> print(df)

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python regex查找包含文本“ abc123”的任何链接 - python regex to find any link that contains the text 'abc123' 如何删除列值末尾的数字和括号,如 Pandas Dataframe 中的 'abc23'、'abc(xyz)' 中的数字和括号? - How to remove numbers and parenthesis at the end of column values like in 'abc23', 'abc(xyz)' in Pandas Dataframe? 如何将一段文本转换为数字列表,以便“abc”在 python 中变为 [0,1,2](空格键变为 26)等 - How can I convert a piece of text into a list of numbers so that “abc” goes to [0, 1, 2] (with a spacebar going to 26) etc in python 您如何从中获取“一些文本” <abc> 一些文字 </abc> 用python? - How do you get “some Text” from <abc>some Text</abc> with python? 大写字母的正则表达式后跟一个空格,后跟数字“ ABC 123”或“ BLZ 420” - Regex for capital letters followed be a space followed by numbers 'ABC 123' or 'BLZ 420' 如何从 DataFrame 列中的名称中删除数字和/或括号 - How do I remove numbers and/or parenthesis from names in a DataFrame column 如何从字符串中提取数字而忽略数字-字母混合 - how to extract numbers from a string ignoring number-letter mixtures 如何从 excel 单元格中的文本字符串中删除数字 - How to remove numbers from text strings in excel cells 如何解释Python中&#39;Abc123P&#39;.istitle()的行为? - How to explain the behavior of 'Abc123P'.istitle() in Python? 如何在 python 中使用 pandas 初始化以前缀 ABC 开头的第一列? - how do I initialize the first column starting with the prefix ABC with pandas in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM