add rows to groups in pandas dataframe

Question

I've got a pandas dataframe df . Created like this:

a = np.array([0,0,0,1,1,1,2,2,2]).T

bcd = np.array([np.arange(1,10)]*3).T

df = pd.DataFrame(bcd, columns=["b","c","d"])

df["a"] = a

Looks like this:

     b   c   d   a
0    1   1   1   0
1    2   2   2   0
2    3   3   3   0
3    4   4   4   1
4    5   5   5   1
5    6   6   6   1
6    7   7   7   2
7    8   8   8   2
8    9   9   9   2

I would like to insert 3 rows after each grouping in column 'a' . Specifically, I want to have some auto-incrementation in column 'b' and put None objects everywhere else: Something like:

     b   c    d    a 
0    1   1    1    0
1    2   2    2    0
2    3   3    3    0
3    10  None None 0
4    11  None None 0
5    12  None None 0
6    4   4    4    1
7    5   5    5    1
8    6   6    6    1
9    10  None None 1
10   11  None None 1
11   12  None None 1
12   7   7    7    2
13   8   8    8    2
14   9   9    9    2
15   10  None None 2
16   11  None None 2
17   12  None None 2

Answer 1

What you want to do is not really an insert operation, as the data structure behind the DataFrame does not allow simple inserting. So, in essence, you will have to build a new DataFrame from the pieces of your old DataFrame .

So, your code should:

Create a new DataFrame
Find where to split the table (by using column a )
Append the slice from the existing table to the new DataFrame
Create new bits of data
Append the new data to the new DataFrame
Repeat steps 2-5 as many times as required.

(Or you can concatenate instead of append, if you find it easier.) One thing to think of is what you do with your indices. If you do not use them, you may ignore them (= create new as needed) by using the ignore_index=True keyword argument on concat or append .

For more information:

http://pandas.pydata.org/pandas-docs/dev/merging.html

BTW, you do not necessarily want to have any None s in your dataframe. If you have numerical data, you want to have NaN instead. Otherwise strange things may happen (you end up with object arrays). See:

http://pandas.pydata.org/pandas-docs/stable/missing_data.html

Answer 2

Just concat the inserts that you want to insert in (and they will be appended in the rear, or df.append(the_insert) , which does the same thing) and reset_index the resultant to get things in the right order:

In [137]:

df2=pd.DataFrame({'b':[11,12,13], 'a':[0]*3})
In [138]:

df3=pd.concat((df, df2)).sort('a').reset_index(drop=True)
#pd.concat((df, df2, df3, df4...., all the others...))
In [139]:

print df3
    a   b   c   d
0   0   1   1   1
1   0   2   2   2
2   0   3   3   3
3   0  11 NaN NaN
4   0  12 NaN NaN
5   0  13 NaN NaN
6   1   4   4   4
7   1   5   5   5
8   1   6   6   6
9   2   7   7   7
10  2   8   8   8
11  2   9   9   9

add rows to groups in pandas dataframe

Question

2 answers

solution1
3 ACCPTED 2014-07-08 21:31:15

solution2
2 2014-07-08 21:40:35

add rows to groups in pandas dataframe

Question

2 answers

solution1 3 ACCPTED 2014-07-08 21:31:15

solution2 2 2014-07-08 21:40:35

solution1
3 ACCPTED 2014-07-08 21:31:15

solution2
2 2014-07-08 21:40:35