[英]add rows to groups in pandas dataframe
I've got a pandas dataframe df
. 我有一个熊猫数据框df
。 Created like this: 像这样创建:
a = np.array([0,0,0,1,1,1,2,2,2]).T
bcd = np.array([np.arange(1,10)]*3).T
df = pd.DataFrame(bcd, columns=["b","c","d"])
df["a"] = a
Looks like this: 看起来像这样:
b c d a
0 1 1 1 0
1 2 2 2 0
2 3 3 3 0
3 4 4 4 1
4 5 5 5 1
5 6 6 6 1
6 7 7 7 2
7 8 8 8 2
8 9 9 9 2
I would like to insert 3 rows after each grouping in column 'a'
. 我想在'a'
列中'a'
每个分组之后插入3行。 Specifically, I want to have some auto-incrementation in column 'b'
and put None
objects everywhere else: Something like: 具体来说,我想在列'b'
增加一些自动增量,并将“ None
对象放到其他地方:
b c d a
0 1 1 1 0
1 2 2 2 0
2 3 3 3 0
3 10 None None 0
4 11 None None 0
5 12 None None 0
6 4 4 4 1
7 5 5 5 1
8 6 6 6 1
9 10 None None 1
10 11 None None 1
11 12 None None 1
12 7 7 7 2
13 8 8 8 2
14 9 9 9 2
15 10 None None 2
16 11 None None 2
17 12 None None 2
What you want to do is not really an insert operation, as the data structure behind the DataFrame
does not allow simple inserting. 您要做的实际上不是插入操作,因为DataFrame
后面的数据结构不允许简单的插入。 So, in essence, you will have to build a new DataFrame
from the pieces of your old DataFrame
. 因此,在本质上,你必须建立一个新的DataFrame
从旧的个DataFrame
。
So, your code should: 因此,您的代码应:
DataFrame
创建一个新的DataFrame
a
) 查找表的拆分位置(使用列a
) DataFrame
将切片从现有表追加到新的DataFrame
DataFrame
将新数据追加到新的DataFrame
(Or you can concatenate instead of append, if you find it easier.) One thing to think of is what you do with your indices. (或者,如果发现比较容易,则可以串联而不是附加。)要考虑的一件事是对索引的处理方式。 If you do not use them, you may ignore them (= create new as needed) by using the ignore_index=True
keyword argument on concat
or append
. 如果不使用它们,则可以通过在concat
或append
上使用ignore_index=True
关键字参数来忽略它们(=根据需要创建新的)。
For more information: 欲获得更多信息:
http://pandas.pydata.org/pandas-docs/dev/merging.html http://pandas.pydata.org/pandas-docs/dev/merging.html
BTW, you do not necessarily want to have any None
s in your dataframe. 顺便说一句,您不一定要在数据框中包含任何None
。 If you have numerical data, you want to have NaN
instead. 如果您有数值数据, NaN
改用NaN
。 Otherwise strange things may happen (you end up with object arrays). 否则可能会发生奇怪的事情(您最终得到对象数组)。 See: 看到:
http://pandas.pydata.org/pandas-docs/stable/missing_data.html http://pandas.pydata.org/pandas-docs/stable/missing_data.html
Just concat
the inserts that you want to insert in (and they will be appended in the rear, or df.append(the_insert)
, which does the same thing) and reset_index
the resultant to get things in the right order: 只需concat
要插入的插入内容(它们将被附加在后面,或df.append(the_insert)
,其作用相同),然后reset_index
结果以正确的顺序得到结果:
In [137]:
df2=pd.DataFrame({'b':[11,12,13], 'a':[0]*3})
In [138]:
df3=pd.concat((df, df2)).sort('a').reset_index(drop=True)
#pd.concat((df, df2, df3, df4...., all the others...))
In [139]:
print df3
a b c d
0 0 1 1 1
1 0 2 2 2
2 0 3 3 3
3 0 11 NaN NaN
4 0 12 NaN NaN
5 0 13 NaN NaN
6 1 4 4 4
7 1 5 5 5
8 1 6 6 6
9 2 7 7 7
10 2 8 8 8
11 2 9 9 9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.