简体   繁体   中英

add rows to groups in pandas dataframe

I've got a pandas dataframe df . Created like this:

a = np.array([0,0,0,1,1,1,2,2,2]).T

bcd = np.array([np.arange(1,10)]*3).T

df = pd.DataFrame(bcd, columns=["b","c","d"])

df["a"] = a

Looks like this:

     b   c   d   a
0    1   1   1   0
1    2   2   2   0
2    3   3   3   0
3    4   4   4   1
4    5   5   5   1
5    6   6   6   1
6    7   7   7   2
7    8   8   8   2
8    9   9   9   2

I would like to insert 3 rows after each grouping in column 'a' . Specifically, I want to have some auto-incrementation in column 'b' and put None objects everywhere else: Something like:

     b   c    d    a 
0    1   1    1    0
1    2   2    2    0
2    3   3    3    0
3    10  None None 0
4    11  None None 0
5    12  None None 0
6    4   4    4    1
7    5   5    5    1
8    6   6    6    1
9    10  None None 1
10   11  None None 1
11   12  None None 1
12   7   7    7    2
13   8   8    8    2
14   9   9    9    2
15   10  None None 2
16   11  None None 2
17   12  None None 2

What you want to do is not really an insert operation, as the data structure behind the DataFrame does not allow simple inserting. So, in essence, you will have to build a new DataFrame from the pieces of your old DataFrame .

So, your code should:

  1. Create a new DataFrame
  2. Find where to split the table (by using column a )
  3. Append the slice from the existing table to the new DataFrame
  4. Create new bits of data
  5. Append the new data to the new DataFrame
  6. Repeat steps 2-5 as many times as required.

(Or you can concatenate instead of append, if you find it easier.) One thing to think of is what you do with your indices. If you do not use them, you may ignore them (= create new as needed) by using the ignore_index=True keyword argument on concat or append .

For more information:

http://pandas.pydata.org/pandas-docs/dev/merging.html

BTW, you do not necessarily want to have any None s in your dataframe. If you have numerical data, you want to have NaN instead. Otherwise strange things may happen (you end up with object arrays). See:

http://pandas.pydata.org/pandas-docs/stable/missing_data.html

Just concat the inserts that you want to insert in (and they will be appended in the rear, or df.append(the_insert) , which does the same thing) and reset_index the resultant to get things in the right order:

In [137]:

df2=pd.DataFrame({'b':[11,12,13], 'a':[0]*3})
In [138]:

df3=pd.concat((df, df2)).sort('a').reset_index(drop=True)
#pd.concat((df, df2, df3, df4...., all the others...))
In [139]:

print df3
    a   b   c   d
0   0   1   1   1
1   0   2   2   2
2   0   3   3   3
3   0  11 NaN NaN
4   0  12 NaN NaN
5   0  13 NaN NaN
6   1   4   4   4
7   1   5   5   5
8   1   6   6   6
9   2   7   7   7
10  2   8   8   8
11  2   9   9   9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM