简体   繁体   English

Pandas/Python:根据另一列中的值设置一列的值

[英]Pandas/Python: Set value of one column based on value in another column

I need to set the value of one column based on the value of another in a Pandas dataframe.我需要根据 Pandas dataframe 中另一列的值来设置一列的值。 This is the logic:这是逻辑:

if df['c1'] == 'Value':
    df['c2'] = 10
else:
    df['c2'] = df['c3']

I am unable to get this to do what I want, which is to simply create a column with new values (or change the value of an existing column: either one works for me).我无法让它做我想做的事,即简单地创建一个具有新值的列(或更改现有列的值:任何一个都适合我)。

If I try to run the code above or if I write it as a function and use the apply method, I get the following:如果我尝试运行上面的代码,或者将其编写为 function 并使用 apply 方法,我会得到以下信息:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

one way to do this would be to use indexing with .loc .一种方法是使用.loc索引。

Example例子

In the absence of an example dataframe, I'll make one up here:在没有示例数据框的情况下,我将在这里补一个:

import numpy as np
import pandas as pd

df = pd.DataFrame({'c1': list('abcdefg')})
df.loc[5, 'c1'] = 'Value'

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5  Value
6      g

Assuming you wanted to create a new column c2 , equivalent to c1 except where c1 is Value , in which case, you would like to assign it to 10:假设您想创建一个新列c2 ,等效于c1除了其中c1Value ,在这种情况下,您希望将其分配给 10:

First, you could create a new column c2 , and set it to equivalent as c1 , using one of the following two lines (they essentially do the same thing):首先,您可以创建一个新列c2 ,并将其设置为等效于c1 ,使用以下两行之一(它们基本上做同样的事情):

df = df.assign(c2 = df['c1'])
# OR:
df['c2'] = df['c1']

Then, find all the indices where c1 is equal to 'Value' using .loc , and assign your desired value in c2 at those indices:然后,使用.loc找到c1等于'Value'所有索引,并在这些索引处在c2中分配所需的值:

df.loc[df['c1'] == 'Value', 'c2'] = 10

And you end up with this:你最终会得到这个:

>>> df
      c1  c2
0      a   a
1      b   b
2      c   c
3      d   d
4      e   e
5  Value  10
6      g   g

If, as you suggested in your question, you would perhaps sometimes just want to replace the values in the column you already have , rather than create a new column, then just skip the column creation, and do the following:如果,正如您在问题中所建议的那样,您有时可能只想替换已有列中的值,而不是创建新列,则只需跳过列创建,然后执行以下操作:

df['c1'].loc[df['c1'] == 'Value'] = 10
# or:
df.loc[df['c1'] == 'Value', 'c1'] = 10

Giving you:给你:

>>> df
      c1
0      a
1      b
2      c
3      d
4      e
5     10
6      g

You can use np.where() to set values based on a specified condition:您可以使用np.where()根据指定条件设置值:

#df
   c1  c2  c3
0   4   2   1
1   8   7   9
2   1   5   8
3   3   3   5
4   3   6   8

Now change values (or set) in column ['c2'] based on your condition.现在根据您的条件更改['c2']列中的值(或设置)。

df['c2'] = np.where(df.c1 == 8,'X', df.c3)

   c1  c2  c3
0   4   1   1
1   8   X   9
2   1   8   8
3   3   5   5
4   3   8   8

尝试:

df['c2'] = df['c1'].apply(lambda x: 10 if x == 'Value' else x)

Note the tilda that reverses the selection.请注意反转选择的 tilda。 It uses pandas methods (ie is faster than if / else ).它使用熊猫方法(即比if / else更快)。

df.loc[(df['c1'] == 'Value'), 'c2'] = 10
df.loc[~(df['c1'] == 'Value'), 'c2'] = df['c3']

You can use pandas.DataFrame.mask to add virtually as many conditions as you need:您可以使用pandas.DataFrame.mask几乎根据需要添加尽可能多的条件:

data = {'a': [1,2,3,4,5], 'b': [6,8,9,10,11]}

d = pd.DataFrame.from_dict(data, orient='columns')
c = {'c1': (2, 'Value1'), 'c2': (3, 'Value2'), 'c3': (5, d['b'])}

d['new'] = np.nan
for value in c.values():
    d['new'].mask(d['a'] == value[0], value[1], inplace=True)

d['new'] = d['new'].fillna('Else')
d

Output:输出:

    a   b   new
0   1   6   Else
1   2   8   Value1
2   3   9   Value2
3   4   10  Else
4   5   11  11

I suggest doing it in two steps:我建议分两步做:

# set fixed value to 'c2' where the condition is met
df.loc[df['c1'] == 'Value', 'c2'] = 10

# copy value from 'c3' to 'c2' where the condition is NOT met
df.loc[df['c1'] != 'Value', 'c2'] = df[df['c1'] != 'Value', 'c3']

Try out df.apply() if you've a small/medium dataframe,如果您有一个小型/中型数据框,请尝试 df.apply(),

df['c2'] = df.apply(lambda x: 10 if x['c1'] == 'Value' else x['c1'], axis = 1)

Else, follow the slicing techniques mentioned in the above comments if you've got a big dataframe.否则,如果您有一个大数据框,请遵循上述评论中提到的切片技术。

I believe Series.map() to be very readable and efficient, eg:我相信Series.map()是非常可读和高效的,例如:

df["c2"] = df["c1"].map(lambda x: 10 if x == 'Value' else x)

I like it because if the conditional logic gets more complex you can move it to a function and just pass in that function instead of the lambda.我喜欢它,因为如果条件逻辑变得更复杂,你可以将它移到一个函数中,然后传入该函数而不是 lambda。

If you need to base your conditional logic on more than one column you can use DataFrame.apply() as others suggest.如果您需要基于多列的条件逻辑,您可以像其他人建议的那样使用DataFrame.apply()

I had a big dataset and .loc[] was taking too long so I found a vectorized way to do it.我有一个很大的数据集,而 .loc[] 花费的时间太长,所以我找到了一种矢量化的方法来做到这一点。 Recall that you can set a column to a logical operator, so this works:回想一下,您可以将列设置为逻辑运算符,因此可以这样操作:

file['Flag'] = (file['Claim_Amount'] > 0)

This gives a Boolean, which I wanted, but you can multiply it by, say, 1 to make an Integer.这给出了我想要的布尔值,但您可以将其乘以 1 以生成整数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas根据一列中的值创建新列,而另一列中为空白 - Python Pandas new column based on value in one column and blank in another 根据另一列(Python,Pandas)中的值删除一列的重复项 - Drop duplicates of one column based on value in another column, Python, Pandas 根据熊猫数据框中另一列的值设置一列的递增值 - Set increasing value of one column based on value in another column in pandas dataframe Python Pandas:Append 列值,基于另一个相同的列值 - Python Pandas: Append column value, based on another same column value 根据另一列 pandas python 的值在 python 中添加新列 - Adding a new column in python based on the value of another column pandas python pandas 在基于另一个 dataframe 列的列中设置值 - pandas set value in column based on another dataframe column 基于另一列的值对一列Pandas DF进行条件运算 - Conditional operation on one column of Pandas DF based on value of another column Pandas:如何根据另一列将一列值分配给变量? - Pandas: How to assign one column value to a variable, based on another column? 根据另一列的值向python pandas数据框添加一列 - Adding a column to a python pandas data frame based on the value of another column Python Pandas 根据另一个列值创建新列 - Python Pandas create new column based on another column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM