[英]Better way to create a new column based on values of other columns
什么是創建下面提到的同一列的更好方法:
col_new = []
for r1 in df['col_A']:
if r1==1:
for r2 in df['col_B']:
if r2!='None':
col_new.append('col_new')
df['col_new'] = col_new
我的數據幀很大(120k * 22),運行上面的代碼使筆記本掛起。 有沒有一種更快,更有效的方法來創建此列,該列表示col_A為1時col_B的所有非空值。
我相信需要創建布爾掩碼,然后通過DataFrame.loc
:
mask = (df['col_A'] == 1) & (df['col_B']!='None')
#if None is not string
#mask = (df['col_A'] == 1) & (df['col_B'].notnull())
df.loc[mask, 'col_new'] = 'col_new'
樣品 :
列中是字符串None
:
df = pd.DataFrame({
'col_A': [1,1,2,1],
'col_B': ['a','None','None','a']
})
print (df)
col_A col_B
0 1 a
1 1 None
2 2 None
3 1 a
mask = (df['col_A'] == 1) & (df['col_B']!='None')
df.loc[mask, 'col_new'] = 'val'
print (df)
col_A col_B col_new
0 1 a val
1 1 None NaN
2 2 None NaN
3 1 a val
在列中不是字符串None
,然后使用Series.notna
:
df = pd.DataFrame({
'col_A': [1,1,2,1],
'col_B': ['a',None,None,'a']
})
print (df)
col_A col_B
0 1 a
1 1 None
2 2 None
3 1 a
mask = (df['col_A'] == 1) & (df['col_B'].notna())
#oldier pandas versions
#mask = (df['col_A'] == 1) & (df['col_B'].notnull())
df.loc[mask, 'col_new'] = 'val'
print (df)
col_A col_B col_new
0 1 a val
1 1 None NaN
2 2 None NaN
3 1 a val
另外,如果要使用if-else
語句numpy.where
真的numpy.where
:
df['col_new'] = np.where(mask, 'val', 'another_val')
print (df)
col_A col_B col_new
0 1 a val
1 1 None another_val
2 2 None another_val
3 1 a val
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.