简体   繁体   English

根据另一列的条件替换一列的值,Pandas

[英]Replace value from a column based on condition of another column, Pandas

Starting DataFrame首发DataFrame

df = pd.DataFrame({'Column A' : ['red','green','yellow', 'orange', 'red', 'blue'],
                   'Column B' : [NaN, 'blue', 'purple', NaN, NaN, NaN],
                   'Column C' : [1, 2, 3, 2, 3, 7]})
Column A A列 Column B B列 Column C专栏C
'red' '红色的' NaN钠盐 1 1个
'green' '绿色' 'blue' '蓝色的' 2 2个
'yellow' '黄色' 'purple' '紫色的' 3 3个
'orange' '橘子' NaN钠盐 2 2个
'red' '红色的' NaN钠盐 3 3个
'blue' '蓝色的' NaN钠盐 7 7

Desired Result期望的结果

Column A A列 Column B B列 Column C专栏C
'red' '红色的' NaN钠盐 1 1个
'blue' '蓝色的' 'blue' '蓝色的' 2 2个
'purple' '紫色的' 'purple' '紫色的' 3 3个
'orange' '橘子' NaN钠盐 2 2个
'red' '红色的' NaN钠盐 3 3个
'blue' '蓝色的' NaN钠盐 7 7

I want to replace values in column A only if the value in Column B is not NaN, and to replace column A with the value in Column B我想仅当 B 列中的值不是 NaN 时才替换 A 列中的值,并将 A 列替换为 B 列中的值

So that I can run the following code:这样我就可以运行以下代码:

df[[Column_A, Column_C]].groupby(Column_A).sum()

Which would result in the following DataFrame:这将导致以下 DataFrame:

Column A A列 Column C专栏C
'red' '红色的' 4 4个
'blue' '蓝色的' 9 9
'purple' '紫色的' 3 3个
'orange' '橘子' 2 2个

I am trying to replace categories before doing a groupby call.我正在尝试在进行groupby调用之前替换类别。

Attempts:尝试:

The DataFrame I am working with has a sequential numerical based index going from 0 to N.我正在使用的 DataFrame 有一个从 0 到 N 的基于顺序数字的索引。
So I could hard code the following:所以我可以硬编码以下内容:
df.iloc[[index], column] = some_string
I do not want to do this as it is not dynamic and the DataFrame data could change.我不想这样做,因为它不是动态的,并且 DataFrame 数据可能会更改。

I believe I could use .agg() or .apply() on either the df or the df.groupby() but this is where I have struggled.我相信我可以在df.apply()上使用.agg()或 .apply( df.groupby()但这是我一直在努力的地方。

Particularly with how to write a function to use with .agg() or .apply()特别是如何编写 function 以与.agg().apply()一起使用

Say:说:

def my_func(x):
    print(x)

Then:然后:
df.apply(my_func)
The result is the first column of df printed.结果是df打印的第一列。
Or:或者:
df.apply(my_func, axis = 1)

The result is the following format for each row:结果是每行的以下格式:

Column A    red
Column B    Nan
Column C    1
Name: 0, dtype: object
Column A    green
Column B    blue
Column C    2
Name: 1, dtype: object

I am not sure how to access each column per row in my_func .我不确定如何访问my_func中每行的每一列。

Edit:编辑:
I am trying to find a way to change the value in Column A if the value, for that row, in Column B is not NaN.如果 B 列中该行的值不是 NaN,我试图找到一种方法来更改 A 列中的值。 The value to use for replacing is the value in Column B, the value to replace is the value in Column A if Column B is not NaN.用于替换的值是 B 列中的值,如果 B 列不是 NaN,则要替换的值是 A 列中的值。

But I want to do this dynamically, meaning not hardcoded as I showed with:但我想动态地执行此操作,这意味着不像我展示的那样硬编码:
df.iloc[[index], column] = some_string

As you mentioned, you could use pd.apply like this:正如您提到的,您可以像这样使用pd.apply

df['Column A'] = df.apply(lambda x: x['Column B'] if str(x['Column B']) not in ['nan', 'NaN'] else x['Column A'], axis=1)

  Column A Column B  Column C
0      red      NaN         1
1     blue     blue         2
2   purple   purple         3
3   orange      NaN         2
4      red      NaN         3
5     blue      NaN         7

Notice that apply is not fast at for very large dataset is not advisable.请注意,对于非常大的数据集,应用速度不快是不可取的。 There are some good answers out there for alternative methods对于替代方法,有一些很好的答案

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 pandas 根据条件将列值替换为另一行列值? - How to replace column value with another rows column value based on condition using pandas? 根据另一列中的条件从 Pandas 数据框中提取值 - Extract Value From Pandas Dataframe Based On Condition in Another Column 根据条件将值从一列复制到另一列(使用熊猫) - Copy value from one column to another based on condition (using pandas) 根据条件将列值替换为另一个表中的列值? - Replace column value with column value in another table based on condition? 根据一列的条件和熊猫中另一列的值创建新列 - Create new column based on condition from one column and the value from another column in pandas Pandas groupby 基于另一列的条件 - Pandas groupby based on a condition from another column 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas 如何用条件 pandas python 替换另一列中的列的值 - how to replace the values of a column from another column with condition pandas python 使用 lambda 如果基于 Pandas dataframe 中另一列的值的列的条件 - Using lambda if condition to column based on value of another column in Pandas dataframe 熊猫根据另一列将条件应用于列值 - pandas apply condition to column value based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM