简体   繁体   English

将行添加到熊猫数据框中的组

[英]add rows to groups in pandas dataframe

I've got a pandas dataframe df . 我有一个熊猫数据框df Created like this: 像这样创建:

a = np.array([0,0,0,1,1,1,2,2,2]).T

bcd = np.array([np.arange(1,10)]*3).T

df = pd.DataFrame(bcd, columns=["b","c","d"])

df["a"] = a

Looks like this: 看起来像这样:

     b   c   d   a
0    1   1   1   0
1    2   2   2   0
2    3   3   3   0
3    4   4   4   1
4    5   5   5   1
5    6   6   6   1
6    7   7   7   2
7    8   8   8   2
8    9   9   9   2

I would like to insert 3 rows after each grouping in column 'a' . 我想在'a'列中'a'每个分组之后插入3行。 Specifically, I want to have some auto-incrementation in column 'b' and put None objects everywhere else: Something like: 具体来说,我想在列'b'增加一些自动增量,并将“ None对象放到其他地方:

     b   c    d    a 
0    1   1    1    0
1    2   2    2    0
2    3   3    3    0
3    10  None None 0
4    11  None None 0
5    12  None None 0
6    4   4    4    1
7    5   5    5    1
8    6   6    6    1
9    10  None None 1
10   11  None None 1
11   12  None None 1
12   7   7    7    2
13   8   8    8    2
14   9   9    9    2
15   10  None None 2
16   11  None None 2
17   12  None None 2

What you want to do is not really an insert operation, as the data structure behind the DataFrame does not allow simple inserting. 您要做的实际上不是插入操作,因为DataFrame后面的数据结构不允许简单的插入。 So, in essence, you will have to build a new DataFrame from the pieces of your old DataFrame . 因此,在本质上,你必须建立一个新的DataFrame从旧的个DataFrame

So, your code should: 因此,您的代码应:

  1. Create a new DataFrame 创建一个新的DataFrame
  2. Find where to split the table (by using column a ) 查找表的拆分位置(使用列a
  3. Append the slice from the existing table to the new DataFrame 将切片从现有表追加到新的DataFrame
  4. Create new bits of data 创建新的数据位
  5. Append the new data to the new DataFrame 将新数据追加到新的DataFrame
  6. Repeat steps 2-5 as many times as required. 根据需要重复步骤2-5。

(Or you can concatenate instead of append, if you find it easier.) One thing to think of is what you do with your indices. (或者,如果发现比较容易,则可以串联而不是附加。)要考虑的一件事是对索引的处理方式。 If you do not use them, you may ignore them (= create new as needed) by using the ignore_index=True keyword argument on concat or append . 如果不使用它们,则可以通过在concatappend上使用ignore_index=True关键字参数来忽略它们(=根据需要创建新的)。

For more information: 欲获得更多信息:

http://pandas.pydata.org/pandas-docs/dev/merging.html http://pandas.pydata.org/pandas-docs/dev/merging.html

BTW, you do not necessarily want to have any None s in your dataframe. 顺便说一句,您不一定要在数据框中包含任何None If you have numerical data, you want to have NaN instead. 如果您有数值数据, NaN改用NaN Otherwise strange things may happen (you end up with object arrays). 否则可能会发生奇怪的事情(您最终得到对象数组)。 See: 看到:

http://pandas.pydata.org/pandas-docs/stable/missing_data.html http://pandas.pydata.org/pandas-docs/stable/missing_data.html

Just concat the inserts that you want to insert in (and they will be appended in the rear, or df.append(the_insert) , which does the same thing) and reset_index the resultant to get things in the right order: 只需concat要插入的插入内容(它们将被附加在后面,或df.append(the_insert) ,其作用相同),然后reset_index结果以正确的顺序得到结果:

In [137]:

df2=pd.DataFrame({'b':[11,12,13], 'a':[0]*3})
In [138]:

df3=pd.concat((df, df2)).sort('a').reset_index(drop=True)
#pd.concat((df, df2, df3, df4...., all the others...))
In [139]:

print df3
    a   b   c   d
0   0   1   1   1
1   0   2   2   2
2   0   3   3   3
3   0  11 NaN NaN
4   0  12 NaN NaN
5   0  13 NaN NaN
6   1   4   4   4
7   1   5   5   5
8   1   6   6   6
9   2   7   7   7
10  2   8   8   8
11  2   9   9   9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM