简体   繁体   English

根据条件用另一列值替换大量列中的值

[英]Replacing values in large number of columns with another column value based on a condition

I have this data: 我有此数据:

id   |  d1   |  d2  |  d3  | .... |  d64   | FINAL_GRADE
1    |  0    |  15  |  0   | .... |  23    | 95
2    |  8    |  0   |  12  | .... |  0     | 75   

And I want to replace all non-zero values in each row with the corresponding value in the FINAL_GRADE column, and obtain this table: 我想用FINAL_GRADE列中的对应值替换每一行中的所有非零值,并获取此表:

id   |  d1   |  d2  |  d3  | .... |  d64   | FINAL_GRADE
1    |  0    |  95  |  0   | .... |  95    | 95
2    |  75   |  0   |  75  | .... |  0     | 75   

Here is my code: 这是我的代码:

df[df.ix[:, 1:63] != 0] = df['FINAL_GRADE']

But, I am receiving this error: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value 但是,我收到此错误: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

I wonder if my code has any issues? 我想知道我的代码是否有任何问题? Or my approach is totally wrong. 否则我的方法是完全错误的。 I appreciate any help! 感谢您的帮助!

One possibility would be to use DF.mask() method on the created boolean mask. 一种可能性是在创建的布尔掩码上使用DF.mask()方法。

Using .ix for setting values fails here presumably because you're operating on the subset of columns having mixed dtypes ( float - due to the generation of the bool mask and subsetting operation & int - values in FINAL_GRADE to be looked up at). 使用.ix用于设置值这里大概是失败,因为你在具有混合列的子集数工作dtypesfloat -由于布尔面罩和子集操作和产生int -在FINAL_GRADE值在要查找)。

This would be the main cause of the TypeError getting generated. 这将是产生TypeError主要原因。

Steps: 脚步:

1) Subset the dataframe by selecting the columns starting with the char d using str.startswith . 1)通过使用str.startswith选择以char d开头的列来对数据str.startswith进行str.startswith

2) Using DF.mask for the conditions wherein the values in this subset are non-zero, we replace them with the contents present in FINAL_GRADE row-wise by specifying axis=0 . 2)在此子集中的值不为零的条件下使用DF.mask ,通过指定axis=0 ,将它们替换为FINAL_GRADE行中存在的内容。

3) Finally, concatenate the id , FINAL_GRADE and the masked DF column-wise using pd.concat ( axis=1 ) 3)最后,使用pd.concataxis=1 )按列连接idFINAL_GRADE和蒙版DF


sub_df = df[df.columns[df.columns.str.startswith('d')]]
mask_df = sub_df.mask(sub_df != 0, df['FINAL_GRADE'], axis=0)
pd.concat([df['id'], mask_df, df['FINAL_GRADE']], axis=1)

在此处输入图片说明

The following might be slightly more crude than strictly necessary, but I think it is a clean and generalized fit for your problem: 以下内容可能比严格必要的内容粗略一些,但我认为这完全可以解决您的问题:

for _, row in df.iterrows():
    row[0:-1][row != 0] = row.FINAL_GRADE

Note that I'm doing a couple of things here, so some notes: 请注意,我在这里做了两件事,因此请注意以下几点:

  1. row[0:-1] will address all row items except the last one, so .ix is not necessary when using integer indexes for this case, and you are not locked into the case where you have exactly 64 columns. row[0:-1]将解决除最后一项以外的所有行项目,因此在这种情况下使用整数索引时,.ix不是必需的,并且您也不会被锁定在只有 64列的情况下。
  2. I am looping over all rows, which is generally not considered the most efficient way of doing things, but I find it readable and sufficiently ok for cases such as yours that are not high-performance calculations repeated hundreds of times. 我遍历所有行,通常这不是最有效的处理方式,但是对于像这样的情况,如果它们不是高性能计算重复数百次的情况,我发现它是可读且足够好的。
  3. _ is convention for ignoring a variable, which in this case is the row index that iterrows() automatically gives me. _是忽略变量的约定,在这种情况下,它是iterrows()自动为我提供的行索引。
  4. Try to use .loc more than .ix because it leverages the semantic benefit that labelling your data gives you. 尝试使用.loc而不是.ix因为它利用了标记数据给您的语义优势。

I'll try to think of a solution without a for loop that is considered pythonic and not too contrived or unreadable. 我将尝试考虑一个没有for循环的解决方案,该解决方案被认为是pythonic的,并且不是太人为或难以理解。

EDIT: Found a one-liner that is in my opinion both readable and simple/general enough to be applied to other/similar problems: 编辑:找到了我认为可读性,简单性/通用性足以应用于其他/类似问题的单线:

df.ix[:, 0:-1] = df.ix[:, 0:-1].where(df == 0, df.FINAL_GRADE, axis=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM