I have the following DataFrame df
:
S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
I am trying to generate the following DataFrame from df
:
S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
You can see that the number of 1
in each columns increases downward by a multiple of 2. Is there in Pandas a function, like fillna
for which I can specify to fill downwards for x rows?
UPDATE In fact, I have a more complicated task.
If this is my df
:
S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
... (all zeros)
S
2011-04-26 1
2011-04-27 0
2011-04-28 0
2011-04-29 0
2011-04-30 0
2011-04-31 0
2011-05-01 0
2011-05-02 0
2011-05-03 0
2011-05-04 0
2011-05-05 0
2011-05-06 0
2011-05-07 0
2011-05-08 0
2011-05-09 0
and I need this:
S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
all zeros every where
S S1 S2 S3
2011-04-26 1 0 0 0
2011-04-27 0 1 0 0
2011-04-28 0 1 0 0
2011-04-29 0 0 1 0
2011-04-30 0 0 1 0
2011-04-31 0 0 1 0
2011-05-01 0 0 1 0
2011-05-02 0 0 0 1
2011-05-03 0 0 0 1
2011-05-04 0 0 0 1
2011-05-05 0 0 0 1
2011-05-06 0 0 0 1
2011-05-07 0 0 0 1
2011-05-08 0 0 0 1
2011-05-09 0 0 0 1
To my best knowledge, there is no ready-available function to do this. But we can use the following trick to do something similar.
import pandas as pd
import numpy as np
# your data
# ========================================
df = pd.DataFrame(0, index=pd.date_range('2015-01-01', periods=100, freq='D'), columns=['col'])
df.iloc[[0, 71], 0] = 1
grouped = df.groupby(df.col.cumsum())
grouped.get_group(1)
Out[275]:
col
2015-01-01 1
2015-01-02 0
2015-01-03 0
2015-01-04 0
2015-01-05 0
2015-01-06 0
2015-01-07 0
2015-01-08 0
... ...
2015-03-05 0
2015-03-06 0
2015-03-07 0
2015-03-08 0
2015-03-09 0
2015-03-10 0
2015-03-11 0
2015-03-12 0
[71 rows x 1 columns]
grouped.get_group(2)
Out[276]:
col
2015-03-13 1
2015-03-14 0
2015-03-15 0
2015-03-16 0
2015-03-17 0
2015-03-18 0
2015-03-19 0
2015-03-20 0
... ...
2015-04-03 0
2015-04-04 0
2015-04-05 0
2015-04-06 0
2015-04-07 0
2015-04-08 0
2015-04-09 0
2015-04-10 0
[29 rows x 1 columns]
# processing
# ==================================
def func(group):
group['temp'] = 0
group.temp.iloc[2 ** np.arange(int(np.log2(len(group))) + 1) - 1] = 1
group['new_col'] = group.temp.cumsum()
return pd.get_dummies(group.new_col)
grouped.apply(func)
Out[281]:
1 2 3 4 5 6 7
2015-01-01 1 0 0 0 0 0 0
2015-01-02 0 1 0 0 0 0 0
2015-01-03 0 1 0 0 0 0 0
2015-01-04 0 0 1 0 0 0 0
2015-01-05 0 0 1 0 0 0 0
2015-01-06 0 0 1 0 0 0 0
2015-01-07 0 0 1 0 0 0 0
2015-01-08 0 0 0 1 0 0 0
... .. .. .. .. .. .. ..
2015-04-03 0 0 0 0 1 NaN NaN
2015-04-04 0 0 0 0 1 NaN NaN
2015-04-05 0 0 0 0 1 NaN NaN
2015-04-06 0 0 0 0 1 NaN NaN
2015-04-07 0 0 0 0 1 NaN NaN
2015-04-08 0 0 0 0 1 NaN NaN
2015-04-09 0 0 0 0 1 NaN NaN
2015-04-10 0 0 0 0 1 NaN NaN
I think it's easier to specify the number of times 2 is squared.
I wrote a function to do this:
def square(d,m):
# m is 2^m, d is DataFrame
r = 0
for item in range(1,m+1):
r += int(pow(2,item))
d['S{}'.format(item)] = 0
d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1
return d
Output:
In [71]: data
Out[71]:
S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
In [72]: square(data,3)
Out[72]:
S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
UPDATED :
def square(d,m,chunk):
# chunk is number of rows your operating on
r = 0
for item in range(d.S.count()/chunk):
for item in range(1,m+1):
r += int(pow(2,item))
if 'S{}'.format(item) in d.columns:
d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1
else:
d['S{}'.format(item)] = 0
d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1
r = 0
r += chunk
return d
Output:
In [99]: data = pd.read_clipboard()
In [100]: data
Out[100]:
S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
2011-04-26 1
2011-04-27 0
2011-04-28 0
2011-04-29 0
2011-04-30 0
2011-04-31 0
2011-05-01 0
2011-05-02 0
2011-05-03 0
2011-05-04 0
2011-05-05 0
2011-05-06 0
2011-05-07 0
2011-05-08 0
2011-05-09 0
In [101]: square(data,3,15)
Out[101]:
S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
2011-04-26 1 0 0 0
2011-04-27 0 1 0 0
2011-04-28 0 1 0 0
2011-04-29 0 0 1 0
2011-04-30 0 0 1 0
2011-04-31 0 0 1 0
2011-05-01 0 0 1 0
2011-05-02 0 0 0 1
2011-05-03 0 0 0 1
2011-05-04 0 0 0 1
2011-05-05 0 0 0 1
2011-05-06 0 0 0 1
2011-05-07 0 0 0 1
2011-05-08 0 0 0 1
2011-05-09 0 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.