[英]Replacing values in large number of columns with another column value based on a condition
I have this data: 我有此数据:
id | d1 | d2 | d3 | .... | d64 | FINAL_GRADE
1 | 0 | 15 | 0 | .... | 23 | 95
2 | 8 | 0 | 12 | .... | 0 | 75
And I want to replace all non-zero values in each row with the corresponding value in the FINAL_GRADE
column, and obtain this table: 我想用FINAL_GRADE
列中的对应值替换每一行中的所有非零值,并获取此表:
id | d1 | d2 | d3 | .... | d64 | FINAL_GRADE
1 | 0 | 95 | 0 | .... | 95 | 95
2 | 75 | 0 | 75 | .... | 0 | 75
Here is my code: 这是我的代码:
df[df.ix[:, 1:63] != 0] = df['FINAL_GRADE']
But, I am receiving this error: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
但是,我收到此错误: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
I wonder if my code has any issues? 我想知道我的代码是否有任何问题? Or my approach is totally wrong. 否则我的方法是完全错误的。 I appreciate any help! 感谢您的帮助!
One possibility would be to use DF.mask()
method on the created boolean mask. 一种可能性是在创建的布尔掩码上使用DF.mask()
方法。
Using .ix
for setting values fails here presumably because you're operating on the subset of columns having mixed dtypes
( float
- due to the generation of the bool mask and subsetting operation & int
- values in FINAL_GRADE to be looked up at). 使用.ix
用于设置值这里大概是失败,因为你在具有混合列的子集数工作dtypes
( float
-由于布尔面罩和子集操作和产生int
-在FINAL_GRADE值在要查找)。
This would be the main cause of the TypeError
getting generated. 这将是产生TypeError
主要原因。
Steps: 脚步:
1) Subset the dataframe by selecting the columns starting with the char d using str.startswith
. 1)通过使用str.startswith
选择以char d开头的列来对数据str.startswith
进行str.startswith
。
2) Using DF.mask
for the conditions wherein the values in this subset are non-zero, we replace them with the contents present in FINAL_GRADE row-wise by specifying axis=0
. 2)在此子集中的值不为零的条件下使用DF.mask
,通过指定axis=0
,将它们替换为FINAL_GRADE行中存在的内容。
3) Finally, concatenate the id , FINAL_GRADE and the masked DF
column-wise using pd.concat
( axis=1
) 3)最后,使用pd.concat
( axis=1
)按列连接id , FINAL_GRADE和蒙版DF
sub_df = df[df.columns[df.columns.str.startswith('d')]]
mask_df = sub_df.mask(sub_df != 0, df['FINAL_GRADE'], axis=0)
pd.concat([df['id'], mask_df, df['FINAL_GRADE']], axis=1)
The following might be slightly more crude than strictly necessary, but I think it is a clean and generalized fit for your problem: 以下内容可能比严格必要的内容粗略一些,但我认为这完全可以解决您的问题:
for _, row in df.iterrows():
row[0:-1][row != 0] = row.FINAL_GRADE
Note that I'm doing a couple of things here, so some notes: 请注意,我在这里做了两件事,因此请注意以下几点:
row[0:-1]
will address all row items except the last one, so .ix is not necessary when using integer indexes for this case, and you are not locked into the case where you have exactly 64 columns. row[0:-1]
将解决除最后一项以外的所有行项目,因此在这种情况下使用整数索引时,.ix不是必需的,并且您也不会被锁定在只有 64列的情况下。 _
is convention for ignoring a variable, which in this case is the row index that iterrows()
automatically gives me. _
是忽略变量的约定,在这种情况下,它是iterrows()
自动为我提供的行索引。 .loc
more than .ix
because it leverages the semantic benefit that labelling your data gives you. 尝试使用.loc
而不是.ix
因为它利用了标记数据给您的语义优势。 I'll try to think of a solution without a for loop that is considered pythonic and not too contrived or unreadable. 我将尝试考虑一个没有for循环的解决方案,该解决方案被认为是pythonic的,并且不是太人为或难以理解。
EDIT: Found a one-liner that is in my opinion both readable and simple/general enough to be applied to other/similar problems: 编辑:找到了我认为可读性,简单性/通用性足以应用于其他/类似问题的单线:
df.ix[:, 0:-1] = df.ix[:, 0:-1].where(df == 0, df.FINAL_GRADE, axis=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.