[英]Using condition to split pandas column of lists into multiple columns.
I have a pandas dataFrame with two columns that looks like the following: 我有一个带有两列的pandas dataFrame,如下所示:
d1 = {'Time1': [[93, 109, 187],[159],[94, 96, 154, 169]],
'Time2':[[16, 48, 66, 128],[123, 136],[40,177,192]]}
df = pd.DataFrame(d1)
I need to split these columns of lists into 4 columns named 1st_half_T1, 2nd_half_T1, 1st_half_T2 and 2nd_half_T2 using pandas. 我需要使用熊猫将列表的这些列分成名为1st_half_T1、2nd_half_T1、1st_half_T2和2nd_half_T2的4列。 The condition is, Time1 splits into 1st_half if Time <= 96
and 2nd_half if Time > 96
and applying the same condition to Time2
gives the following output. 的条件是,时间1分裂成1st_half如果Time <= 96
和如果2nd_half Time > 96
和应用相同的条件, Time2
给出了以下的输出。
1st_half_T1 2nd_half_T1 1st_half_T2 2nd_half_T2
0 [93] [109, 187] [16, 48, 66] [128]
1 [] [159] [] [123, 126]
2 [94, 96] [154, 169] [40] [177, 192]
Use list comprehensions with DataFrame
constructor: 在DataFrame
构造函数中使用列表DataFrame
:
t11 = [[y for y in x if y <=96] for x in df['Time1']]
t12 = [[y for y in x if y >96] for x in df['Time1']]
t21 = [[y for y in x if y <=96] for x in df['Time2']]
t22 = [[y for y in x if y >96] for x in df['Time2']]
df = pd.DataFrame({'1st_half_T1':t11, '2nd_half_T1':t12,'1st_half_T2':t21, '2nd_half_T2':t22})
print (df)
1st_half_T1 2nd_half_T1 1st_half_T2 2nd_half_T2
0 [93] [109, 187] [16, 48, 66] [128]
1 [] [159] [] [123, 136]
2 [94, 96] [154, 169] [40] [177, 192]
df_new = pd.DataFrame()
df_new.loc[:,'1st_half_T1'] = df['Time1'].apply(lambda x : [y for y in x if y <=96])
df_new.loc[:,'2nd_half_T1'] = df['Time1'].apply(lambda x : [y for y in x if y >96])
df_new.loc[:,'1st_half_T2'] = df['Time2'].apply(lambda x : [y for y in x if y <=96])
df_new.loc[:,'2nd_half_T2'] = df['Time2'].apply(lambda x : [y for y in x if y >96])
df_new
Out[64]:
1st_half_T1 2nd_half_T1 1st_half_T2 2nd_half_T2
0 [93] [109, 187] [16, 48, 66] [128]
1 [] [159] [] [123, 136]
2 [94, 96] [154, 169] [40] [177, 192]
Use apply
with a custom function 使用apply
使用自定义功能
def my_split(row):
return pd.Series({
'1st_half_T1': [i for i in row.Time1 if i <= 96],
'2nd_half_T1': [i for i in row.Time1 if i > 96],
'1st_half_T2': [i for i in row.Time2 if i <= 96],
'2nd_half_T2': [i for i in row.Time2 if i > 96]
})
df.apply(my_split, axis=1)
Out[]:
1st_half_T1 1st_half_T2 2nd_half_T1 2nd_half_T2
0 [93] [16, 48, 66] [109, 187] [128]
1 [] [] [159] [123, 136]
2 [94, 96] [40] [154, 169] [177, 192]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.