I have a dataframe which can be created with this:
import pandas as pd
import numpy as np
#create df
data={'id':['a','b','c','d'],
'cd':[0,4,1,3],
'ddf':[2,5,2,5],
0:np.nan,
1:np.nan,
2:np.nan,
3:np.nan,
4:np.nan,
5:np.nan,
6:np.nan
}
df=pd.DataFrame.from_dict(data)[['id','cd','ddf',0,1,2,3,4,5,6]]
and looks like this:
df
Out[35]:
id cd ddf 0 1 2 3 4 5 6
0 a 0 2 NaN NaN NaN NaN NaN NaN NaN
1 b 4 5 NaN NaN NaN NaN NaN NaN NaN
2 c 1 2 NaN NaN NaN NaN NaN NaN NaN
3 d 3 5 NaN NaN NaN NaN NaN NaN NaN
What I want to do is calculate the difference between the column name for columns 0,1,2,3,4,5,6 and df['cd'] -->IF the column name is >= to df['cd']
AND the column name is <= to df['ddf']
. The resulting df
should look like:
df
Out[45]:
id cd ddf 0 1 2 3 4 5 6
0 a 0 2 0.0 1.0 2.0 NaN NaN NaN NaN
1 b 4 5 NaN NaN NaN NaN 0.0 1.0 NaN
2 c 1 2 NaN 0.0 1.0 NaN NaN NaN NaN
3 d 3 5 NaN NaN NaN 0.0 1.0 2.0 NaN
I have successfully filled the first part of the IF clause using:
df.loc[:,j]=(j-i[:,None])
where:
i=df.cd.values
j=[0,1,2,3,4,5,6]
But having a problem doing the " column name is <= to df['ddf']
" part. Ideally we can do both together. Speed will be very important as the full dataframe is very large at >100m rows and j
having a length of roughly 4,000.
Here is one way using numpy
broadcast
s1=df.cd.values
s2=df.ddf.values
s=df.columns[3:].values
t=(s1[:,None]-s<=0)&(s2[:,None]-s>=0)
updf=pd.DataFrame(t.cumsum(axis=1),columns=s,index=df.index)
df.update((updf-1).where(t))
df
Out[590]:
id cd ddf 0 1 2 3 4 5 6
0 a 0 2 0.0 1.0 2.0 NaN NaN NaN NaN
1 b 4 5 NaN NaN NaN NaN 0.0 1.0 NaN
2 c 1 2 NaN 0.0 1.0 NaN NaN NaN NaN
3 d 3 5 NaN NaN NaN 0.0 1.0 2.0 NaN
here is a way
i=df.cd.values
j=[0,1,2,3,4,5,6]
df.loc[:,j]=(j-i[:,None])
print(df)
for c in j :
for l in range(df.shape[0]) :
if c < df.cd[l] or c > df.ddf[l] :
df[c][l] = np.nan
df
output :
id cd ddf 0 1 2 3 4 5 6
0 a 0 2 0.0 1.0 2.0 NaN NaN NaN NaN
1 b 4 5 NaN NaN NaN NaN 0.0 1.0 NaN
2 c 1 2 NaN 0.0 1.0 NaN NaN NaN NaN
3 d 3 5 NaN NaN NaN 0.0 1.0 2.0 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.