[英]Pandas: Iterate over existing columns and create new columns based on conditionals
The best version of a question that relates to my question is found here . 与我的问题相关的问题的最佳版本位于此处 。 But I'm running into a hiccup somewhere. 但是我在某个地方遇到了麻烦。
My dataframe: 我的数据框:
df = pd.DataFrame({'KEY': ['100000003', '100000009', '100000009', '100000009'],
'RO_1': [1, 1, 4,1],
'RO_2': [1, 0, 0,0],
'RO_3': [1, 1, 1,1],
'RO_4': [1, 4, 1,1]})
KEY RO_1 RO_2 RO_3 RO_4
0 100000003 1 1 1 1
1 100000009 1 0 1 4
2 100000009 4 0 1 1
3 100000009 1 0 1 1
I want to create 3 addition columns labeled 'Month1', 'Month2', to 'Month4'. 我想创建3个附加列,分别标记为“ Month1”,“ Month2”和“ Month4”。 Something simple like: 很简单的东西:
for i in range(3):
df.loc[1,'Month'+str(i)] = 1 # '1' is just there as a place holder
Although I'm getting a warning message when I execute this code: 尽管执行此代码时收到警告消息:
"A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"
I want to combine this with conditionals to fill in each cell for each column and each row. 我想将其与条件条件结合起来以填充每一列和每一行的每个单元格。
The code below will create one one column and flag based on the condition if any column with RO_ has either condition 下面的代码将创建一个单列并根据条件标记,如果任何带有RO_的列都具有该条件
namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for col in ROList:
for i in range(3):
df['Month'] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
I treid combining the two codes but I am missing a fundamental understanding of how to do this. 我很想将这两个代码结合起来,但是我对如何做到这一点缺乏基本的了解。 Any help would be great. 任何帮助都会很棒。
Final expected result: 最终预期结果:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
IIUC enumerate
IIUC enumerate
namelist = df.columns.get_values().tolist()
ROList = [s for s in namelist if "RO_" in s]
for i,col in enumerate(ROList):
df['Month'+str(i+1)] = np.where(np.logical_or(df[col]==4,df[col]==1), '1', '0')
df
Out[194]:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Your logic seems like change 4 to 1 您的逻辑似乎变成了4比1
df.assign(**df.loc[:,ROList].mask(df.loc[:,ROList]==4,1).rename(columns=dict(zip(ROList,list(range(1,len(ROList)+1))))).add_prefix('Month'))
Out[15]:
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Use filter
+ isin
+ rename
, for a single pipelined transformation of your data. 使用filter
+ isin
+ rename
,为您的数据的单一流水线改造。
v = (df.filter(regex='^RO_') # select columns
.isin([4, 1]) # check if the value is 4 or 1
.astype(int) # convert the `bool` result to `int`
.rename( # rename columns
columns=lambda x: x.replace('RO_', 'Month')
))
Or, for the sake of performance, 或者,为了表现,
v = df.filter(regex='^RO_')\
.isin([4, 1])\
.astype(int)
v.columns = v.columns.str.replace('RO_', 'Month')
Finally, concat
enate the result with the original. 最后, concat
enate与原来的结果。
pd.concat([df, v], axis=1)
KEY RO_1 RO_2 RO_3 RO_4 Month1 Month2 Month3 Month4
0 100000003 1 1 1 1 1 1 1 1
1 100000009 1 0 1 4 1 0 1 1
2 100000009 4 0 1 1 1 0 1 1
3 100000009 1 0 1 1 1 0 1 1
Seems like you are creating a new column for each existing column in your dataframe. 似乎您正在为数据框中的每个现有列创建一个新列。 You can do something like: 您可以执行以下操作:
original_cols = df.columns
for c in original_cols:
cname = "Month" + c.split("_")[-1]
df[cname] = df[c].apply(lambda x: 1 if (x == 1) or (x == 4) else 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.