[英]Better way to create a new column based on values of other columns
What is a better way to create the same column mentioned below: 什么是创建下面提到的同一列的更好方法:
col_new = []
for r1 in df['col_A']:
if r1==1:
for r2 in df['col_B']:
if r2!='None':
col_new.append('col_new')
df['col_new'] = col_new
My dataframe is huge (120k * 22) and running the above code is hanging the notebook. 我的数据帧很大(120k * 22),运行上面的代码使笔记本挂起。 Is there a faster and more efficient way to create this column where it represents all the non-null values of col_B when col_A is 1.
有没有一种更快,更有效的方法来创建此列,该列表示col_A为1时col_B的所有非空值。
I believe need to create boolean mask and then append value by DataFrame.loc
: 我相信需要创建布尔掩码,然后通过
DataFrame.loc
:
mask = (df['col_A'] == 1) & (df['col_B']!='None')
#if None is not string
#mask = (df['col_A'] == 1) & (df['col_B'].notnull())
df.loc[mask, 'col_new'] = 'col_new'
Sample : 样品 :
In column are strings None
s: 列中是字符串
None
:
df = pd.DataFrame({
'col_A': [1,1,2,1],
'col_B': ['a','None','None','a']
})
print (df)
col_A col_B
0 1 a
1 1 None
2 2 None
3 1 a
mask = (df['col_A'] == 1) & (df['col_B']!='None')
df.loc[mask, 'col_new'] = 'val'
print (df)
col_A col_B col_new
0 1 a val
1 1 None NaN
2 2 None NaN
3 1 a val
In column are not strings None
s , then use Series.notna
: 在列中不是字符串
None
,然后使用Series.notna
:
df = pd.DataFrame({
'col_A': [1,1,2,1],
'col_B': ['a',None,None,'a']
})
print (df)
col_A col_B
0 1 a
1 1 None
2 2 None
3 1 a
mask = (df['col_A'] == 1) & (df['col_B'].notna())
#oldier pandas versions
#mask = (df['col_A'] == 1) & (df['col_B'].notnull())
df.loc[mask, 'col_new'] = 'val'
print (df)
col_A col_B col_new
0 1 a val
1 1 None NaN
2 2 None NaN
3 1 a val
Also if want use if-else
statement numpy.where
is really helpfull: 另外,如果要使用
if-else
语句numpy.where
真的numpy.where
:
df['col_new'] = np.where(mask, 'val', 'another_val')
print (df)
col_A col_B col_new
0 1 a val
1 1 None another_val
2 2 None another_val
3 1 a val
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.