Pandas Dataframe of staggered zeros

Question

I'm building a monte carlo model and need to model how many new items I capture each month, for a given of months. Each month I add a random number of items with a known mean and stdev.

months = ['2017-03','2017-04','2017-05']
new = np.random.normal(4,3,size = len(months)).round()
print new

[ 1.  5.  4.]

df_new = pd.DataFrame(zip(months,new),columns = ['Period','newPats'])
print df_new

    Period  newPats
0  2017-03      1.0
1  2017-04      5.0
2  2017-05      4.0

I need to transform this into an item x month dataframe, where the value is a zero until the month that the given item starts.

Here's the shape I have:

df_full = pd.DataFrame(np.ones((new.sum(), len(months))),columns = months)

   2017-03  2017-04  2017-05
0      1.0      1.0      1.0
1      1.0      1.0      1.0
2      1.0      1.0      1.0
3      1.0      1.0      1.0
4      1.0      1.0      1.0
5      1.0      1.0      1.0
6      1.0      1.0      1.0
7      1.0      1.0      1.0
8      1.0      1.0      1.0
9      1.0      1.0      1.0

and here's the output I need:

#perform transformation
print df_out

   2017-03  2017-04  2017-05
0        1        1        1
1        0        1        1
2        0        1        1
3        0        1        1
4        0        1        1
5        0        1        1
6        0        0        1
7        0        0        1
8        0        0        1
9        0        0        1

The rule is that there was 1 item added in 2017-03, so all periods = 1 for the first record. The next 5 items were added in 2017-04, so all prior periods = 0. The final 4 items were added in 2017-05, so they are only = 1 in the last month. This is going into a monte carlo simulation which will be run thousands of times, so I can't manually iterate over the columns/rows - any vectorized suggestions for how to handle?

Answer 1

Beat you all to it.

df_out = pd.DataFrame([new[:x+1].sum() * [1] + (new.sum() - new[:x+1].sum() ) * [0] for x in range(len(months))]).transpose()
df_out.columns = months

print df_out



2017-03  2017-04  2017-05
0        1        1        1
1        0        1        1
2        0        1        1
3        0        1        1
4        0        1        1
5        0        1        1
6        0        0        1
7        0        0        1
8        0        0        1
9        0        0        1

Pandas Dataframe of staggered zeros

Question

1 answers

solution1
0 2017-02-27 16:42:13

Pandas Dataframe of staggered zeros

Question

1 answers

solution1 0 2017-02-27 16:42:13

solution1
0 2017-02-27 16:42:13