简体   繁体   中英

Python: Pandas generating downward filling variables in DataFrame

I have the following DataFrame df :

                S   
2011-01-26      1
2011-01-27      0
2011-01-28      0
2011-01-29      0
2011-01-30      0
2011-01-31      0
2011-02-01      0
2011-02-02      0
2011-02-03      0
2011-02-04      0
2011-02-05      0
2011-02-06      0
2011-02-07      0
2011-02-08      0
2011-02-09      0

I am trying to generate the following DataFrame from df :

                S  S1 S2 S3   
2011-01-26      1  0  0  0
2011-01-27      0  1  0  0
2011-01-28      0  1  0  0
2011-01-29      0  0  1  0
2011-01-30      0  0  1  0
2011-01-31      0  0  1  0
2011-02-01      0  0  1  0
2011-02-02      0  0  0  1
2011-02-03      0  0  0  1
2011-02-04      0  0  0  1
2011-02-05      0  0  0  1
2011-02-06      0  0  0  1
2011-02-07      0  0  0  1
2011-02-08      0  0  0  1
2011-02-09      0  0  0  1

You can see that the number of 1 in each columns increases downward by a multiple of 2. Is there in Pandas a function, like fillna for which I can specify to fill downwards for x rows?

UPDATE In fact, I have a more complicated task.

If this is my df :

                S   
2011-01-26      1
2011-01-27      0
2011-01-28      0
2011-01-29      0
2011-01-30      0
2011-01-31      0
2011-02-01      0
2011-02-02      0
2011-02-03      0
2011-02-04      0
2011-02-05      0
2011-02-06      0
2011-02-07      0
2011-02-08      0
2011-02-09      0
...         (all zeros)
                    S   
2011-04-26      1
2011-04-27      0
2011-04-28      0
2011-04-29      0
2011-04-30      0
2011-04-31      0
2011-05-01      0
2011-05-02      0
2011-05-03      0
2011-05-04      0
2011-05-05      0
2011-05-06      0
2011-05-07      0
2011-05-08      0
2011-05-09      0

and I need this:

                S  S1 S2 S3   
2011-01-26      1  0  0  0
2011-01-27      0  1  0  0
2011-01-28      0  1  0  0
2011-01-29      0  0  1  0
2011-01-30      0  0  1  0
2011-01-31      0  0  1  0
2011-02-01      0  0  1  0
2011-02-02      0  0  0  1
2011-02-03      0  0  0  1
2011-02-04      0  0  0  1
2011-02-05      0  0  0  1
2011-02-06      0  0  0  1
2011-02-07      0  0  0  1
2011-02-08      0  0  0  1
2011-02-09      0  0  0  1
all zeros every where
                    S  S1 S2 S3   
2011-04-26      1  0  0  0
2011-04-27      0  1  0  0
2011-04-28      0  1  0  0
2011-04-29      0  0  1  0
2011-04-30      0  0  1  0
2011-04-31      0  0  1  0
2011-05-01      0  0  1  0
2011-05-02      0  0  0  1
2011-05-03      0  0  0  1
2011-05-04      0  0  0  1
2011-05-05      0  0  0  1
2011-05-06      0  0  0  1
2011-05-07      0  0  0  1
2011-05-08      0  0  0  1
2011-05-09      0  0  0  1

To my best knowledge, there is no ready-available function to do this. But we can use the following trick to do something similar.

import pandas as pd
import numpy as np

# your data
# ========================================
df = pd.DataFrame(0, index=pd.date_range('2015-01-01', periods=100, freq='D'), columns=['col'])
df.iloc[[0, 71], 0] = 1

grouped = df.groupby(df.col.cumsum())

grouped.get_group(1)

Out[275]: 
            col
2015-01-01    1
2015-01-02    0
2015-01-03    0
2015-01-04    0
2015-01-05    0
2015-01-06    0
2015-01-07    0
2015-01-08    0
...         ...
2015-03-05    0
2015-03-06    0
2015-03-07    0
2015-03-08    0
2015-03-09    0
2015-03-10    0
2015-03-11    0
2015-03-12    0

[71 rows x 1 columns]

grouped.get_group(2)

Out[276]: 
            col
2015-03-13    1
2015-03-14    0
2015-03-15    0
2015-03-16    0
2015-03-17    0
2015-03-18    0
2015-03-19    0
2015-03-20    0
...         ...
2015-04-03    0
2015-04-04    0
2015-04-05    0
2015-04-06    0
2015-04-07    0
2015-04-08    0
2015-04-09    0
2015-04-10    0

[29 rows x 1 columns]

# processing
# ==================================

def func(group):
    group['temp'] = 0
    group.temp.iloc[2 ** np.arange(int(np.log2(len(group))) + 1) - 1] = 1
    group['new_col'] = group.temp.cumsum()
    return pd.get_dummies(group.new_col)


grouped.apply(func)

Out[281]: 
            1  2  3  4  5   6   7
2015-01-01  1  0  0  0  0   0   0
2015-01-02  0  1  0  0  0   0   0
2015-01-03  0  1  0  0  0   0   0
2015-01-04  0  0  1  0  0   0   0
2015-01-05  0  0  1  0  0   0   0
2015-01-06  0  0  1  0  0   0   0
2015-01-07  0  0  1  0  0   0   0
2015-01-08  0  0  0  1  0   0   0
...        .. .. .. .. ..  ..  ..
2015-04-03  0  0  0  0  1 NaN NaN
2015-04-04  0  0  0  0  1 NaN NaN
2015-04-05  0  0  0  0  1 NaN NaN
2015-04-06  0  0  0  0  1 NaN NaN
2015-04-07  0  0  0  0  1 NaN NaN
2015-04-08  0  0  0  0  1 NaN NaN
2015-04-09  0  0  0  0  1 NaN NaN
2015-04-10  0  0  0  0  1 NaN NaN

I think it's easier to specify the number of times 2 is squared.

I wrote a function to do this:

def square(d,m):

    # m is 2^m, d is DataFrame

    r = 0

    for item in range(1,m+1):

        r += int(pow(2,item))
        d['S{}'.format(item)] = 0
        d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1

    return d

Output:

In [71]: data
Out[71]: 
            S
2011-01-26  1
2011-01-27  0
2011-01-28  0
2011-01-29  0
2011-01-30  0
2011-01-31  0
2011-02-01  0
2011-02-02  0
2011-02-03  0
2011-02-04  0
2011-02-05  0
2011-02-06  0
2011-02-07  0
2011-02-08  0
2011-02-09  0

In [72]: square(data,3)
Out[72]: 
            S  S1  S2  S3
2011-01-26  1   0   0   0
2011-01-27  0   1   0   0
2011-01-28  0   1   0   0
2011-01-29  0   0   1   0
2011-01-30  0   0   1   0
2011-01-31  0   0   1   0
2011-02-01  0   0   1   0
2011-02-02  0   0   0   1
2011-02-03  0   0   0   1
2011-02-04  0   0   0   1
2011-02-05  0   0   0   1
2011-02-06  0   0   0   1
2011-02-07  0   0   0   1
2011-02-08  0   0   0   1
2011-02-09  0   0   0   1

UPDATED :

def square(d,m,chunk): 

    # chunk is number of rows your operating on

    r = 0

    for item in range(d.S.count()/chunk):

        for item in range(1,m+1):

            r += int(pow(2,item))

            if 'S{}'.format(item) in d.columns:
                d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1
            else:
                d['S{}'.format(item)] = 0
                d.ix[(r - int(pow(2,item))+1):r+1, 'S{}'.format(item)] = 1
        r = 0

        r += chunk

    return d

Output:

In [99]: data = pd.read_clipboard()

In [100]: data
Out[100]: 
            S
2011-01-26  1
2011-01-27  0
2011-01-28  0
2011-01-29  0
2011-01-30  0
2011-01-31  0
2011-02-01  0
2011-02-02  0
2011-02-03  0
2011-02-04  0
2011-02-05  0
2011-02-06  0
2011-02-07  0
2011-02-08  0
2011-02-09  0
2011-04-26  1
2011-04-27  0
2011-04-28  0
2011-04-29  0
2011-04-30  0
2011-04-31  0
2011-05-01  0
2011-05-02  0
2011-05-03  0
2011-05-04  0
2011-05-05  0
2011-05-06  0
2011-05-07  0
2011-05-08  0
2011-05-09  0

In [101]: square(data,3,15)
Out[101]: 
            S  S1  S2  S3
2011-01-26  1   0   0   0
2011-01-27  0   1   0   0
2011-01-28  0   1   0   0
2011-01-29  0   0   1   0
2011-01-30  0   0   1   0
2011-01-31  0   0   1   0
2011-02-01  0   0   1   0
2011-02-02  0   0   0   1
2011-02-03  0   0   0   1
2011-02-04  0   0   0   1
2011-02-05  0   0   0   1
2011-02-06  0   0   0   1
2011-02-07  0   0   0   1
2011-02-08  0   0   0   1
2011-02-09  0   0   0   1
2011-04-26  1   0   0   0
2011-04-27  0   1   0   0
2011-04-28  0   1   0   0
2011-04-29  0   0   1   0
2011-04-30  0   0   1   0
2011-04-31  0   0   1   0
2011-05-01  0   0   1   0
2011-05-02  0   0   0   1
2011-05-03  0   0   0   1
2011-05-04  0   0   0   1
2011-05-05  0   0   0   1
2011-05-06  0   0   0   1
2011-05-07  0   0   0   1
2011-05-08  0   0   0   1
2011-05-09  0   0   0   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM