简体   繁体   English

从Pandas数据框中的其他列分配列的值

[英]Assign columns' value from other columns in Pandas dataframe

How do i assign columns in my dataframe to be equal to another column if/where condition is met? 如果/在哪里满足条件,我如何将数据框中的列分配为等于另一列?

Update 更新
The problem 问题
I need to assign many columns values (and sometimes a value from another column in that row) when the condition is met. 满足条件时,我需要分配许多列值(有时是该行另一列的值)。

The condition is not the problem. 条件不是问题。

I need an efficient way to do this: 我需要一种有效的方法来做到这一点:

df.loc[some condition it doesn't matter,
['a','b','c','d','e','f','g','x','y']]=df['z'],1,3,4,5,6,7,8,df['p']

Simplified example data 简化的示例数据

d = {'var' : pd.Series([10,61]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df=pd.DataFrame(d)

Condition if var is not missing and first digit is less than 5 条件,如果不缺少var并且第一位数字小于5
Result make df.x=df.z & df.y=1 结果使df.x = df.z&df.y = 1

Here is psuedo code that doesn't work, but it is what I would want. 这是伪代码,不起作用,但这是我想要的。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x','y']]=df['z'],1

but i get 但我明白了

ValueError: cannot set using a list-like indexer with a different length than the value ValueError:无法使用长度与值不同的类似列表的索引器进行设置

ideal output 理想输出

     c  var     x     z     y
0  100    10    x     x     1
1    0    61    None  x  None

The code below works, but is too inefficient because i need to assign values to multiple columns. 下面的代码有效,但是效率太低,因为我需要将值分配给多列。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x']]=df['z']
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['y']]=1

This is one way of doing it: 这是一种实现方式:

import pandas as pd
import numpy as np

d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)

# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df

Output: 输出:

     c  var     x     y  z
0  100    1     x     1  x
1    0    6  None  None  x

As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1 on the y column. 如您所见,布尔掩码必须应用于赋值的两侧,并且您需要在y列上广播值1 It is probably cleaner to split the steps into multiple lines. 将步骤分为多行可能更干净。

Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps: 已更新问题,请编辑:更一般而言,由于某些分配依赖于其他列,并且某些分配只是沿该列广播,因此您可以分两个步骤进行操作:

df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]

You may profile and see if this is efficient enough for your use case. 您可以进行分析,看看这对于您的用例是否足够有效。

You can work row wise: 您可以按行工作:

def f(row):
    if row['var'] is not None and int(str(row['var'])[0]) < 5:
        row[['x', 'y']] = row['z'], 1
    return row

>>> df.apply(f, axis=1)
     c  var     x   y  z
0  100   10     x   1  x
1    0   61  None NaN  x

To overwrite the original df: 要覆盖原始df,请执行以下操作:

df = df.apply(f, axis=1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据同一pandas数据框中的其他列为列分配值 - Assign value to a column based of other columns from the same pandas dataframe Pandas dataframe 根据其他列给列表赋值 - Pandas dataframe assign value to lists based on other columns 根据条件给pandas dataframe列赋值 - Assign value to pandas dataframe columns based on condition 如何用其他列的计算值替换熊猫数据框中的 NaN - How to replace NaN in pandas dataframe with calculated value from other columns 在pandas数据框中,我想基于将其他列过滤为某些值来为该列分配值 - In a pandas dataframe I would like to assign a value to a column based on filtering other columns to certain values 根据来自另一个数据框的数据将值分配给Pandas数据框中的列 - Assign values to columns in Pandas Dataframe based on data from another dataframe dataframe 通过键分配不同大小的其他 dataframe 的列 - dataframe assign columns from other dataframe with different size by key 通过分隔符拆分列中的值并将值分配给 Pandas dataframe 中的多个列 - Split values in a column by delimiter and assign value to multiple columns in Pandas dataframe Pandas dataframe select 列基于其他 Z6A8064B5DF479455500553 列中的值47DC - Pandas dataframe select Columns based on other dataframe contains column value in it Python:使用其他列将Pandas中的新列的值分配为列表 - Python: Assign value to a new column in Pandas as list using other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM