简体   繁体   English

熊猫:遍历现有列并根据条件创建新列

[英]Pandas: Iterate over existing columns and create new columns based on conditionals

The best version of a question that relates to my question is found here . 与我的问题相关的问题的最佳版本位于此处 But I'm running into a hiccup somewhere. 但是我在某个地方遇到了麻烦。

My dataframe: 我的数据框:

df = pd.DataFrame({'KEY': ['100000003', '100000009', '100000009', '100000009'], 
              'RO_1': [1, 1, 4,1],
              'RO_2': [1, 0, 0,0],
              'RO_3': [1, 1, 1,1],
              'RO_4': [1, 4, 1,1]})

    KEY         RO_1  RO_2   RO_3 RO_4 
0   100000003   1      1     1    1   
1   100000009   1      0     1    4    
2   100000009   4      0     1    1    
3   100000009   1      0     1    1   

I want to create 3 addition columns labeled 'Month1', 'Month2', to 'Month4'. 我想创建3个附加列,分别标记为“ Month1”,“ Month2”和“ Month4”。 Something simple like: 很简单的东西:

for i in range(3):
    df.loc[1,'Month'+str(i)] = 1 # '1' is just there as a place holder

Although I'm getting a warning message when I execute this code: 尽管执行此代码时收到警告消息:

"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"

I want to combine this with conditionals to fill in each cell for each column and each row. 我想将其与条件条件结合起来以填充每一列和每一行的每个单元格。

The code below will create one one column and flag based on the condition if any column with RO_ has either condition 下面的代码将创建一个单列并根据条件标记,如果任何带有RO_的列都具有该条件

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for col in ROList:
    for i in range(3):
        df['Month'] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0') 
df

I treid combining the two codes but I am missing a fundamental understanding of how to do this. 我很想将这两个代码结合起来,但是我对如何做到这一点缺乏基本的了解。 Any help would be great. 任何帮助都会很棒。

Final expected result: 最终预期结果:

    KEY         RO_1  RO_2   RO_3 RO_4 Month1 Month2 Month3 Month4
0   100000003   1      1     1    1    1      1      1      1
1   100000009   1      0     1    4    1      0      1      1
2   100000009   4      0     1    1    1      0      1      1  
3   100000009   1      0     1    1    1      0      1      1 

IIUC enumerate IIUC enumerate

namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for i,col in enumerate(ROList):

    df['Month'+str(i+1)] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
Out[194]: 
         KEY  RO_1  RO_2  RO_3  RO_4 Month1 Month2 Month3 Month4
0  100000003     1     1     1     1      1      1      1      1
1  100000009     1     0     1     4      1      0      1      1
2  100000009     4     0     1     1      1      0      1      1
3  100000009     1     0     1     1      1      0      1      1

Your logic seems like change 4 to 1 您的逻辑似乎变成了4比1

df.assign(**df.loc[:,ROList].mask(df.loc[:,ROList]==4,1).rename(columns=dict(zip(ROList,list(range(1,len(ROList)+1))))).add_prefix('Month'))
Out[15]: 
         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

Use filter + isin + rename , for a single pipelined transformation of your data. 使用filter + isin + rename ,为您的数据的单一流水线改造。

v = (df.filter(regex='^RO_')    # select columns
      .isin([4, 1])             # check if the value is 4 or 1
      .astype(int)              # convert the `bool` result to `int`
      .rename(                  # rename columns
          columns=lambda x: x.replace('RO_', 'Month')
      ))

Or, for the sake of performance, 或者,为了表现,

v = df.filter(regex='^RO_')\
          .isin([4, 1])\
          .astype(int) 
v.columns = v.columns.str.replace('RO_', 'Month')  

Finally, concat enate the result with the original. 最后, concat enate与原来的结果。

pd.concat([df, v], axis=1)

         KEY  RO_1  RO_2  RO_3  RO_4  Month1  Month2  Month3  Month4
0  100000003     1     1     1     1       1       1       1       1
1  100000009     1     0     1     4       1       0       1       1
2  100000009     4     0     1     1       1       0       1       1
3  100000009     1     0     1     1       1       0       1       1

Seems like you are creating a new column for each existing column in your dataframe. 似乎您正在为数据框中的每个现有列创建一个新列。 You can do something like: 您可以执行以下操作:

original_cols = df.columns
for c in original_cols:
    cname = "Month" + c.split("_")[-1]
    df[cname] = df[c].apply(lambda x: 1 if (x == 1) or (x == 4) else 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM