简体   繁体   English

熊猫系列可以是一列而不是一行吗?

[英]Can a pandas series be a column rather than a row?

This is a real question, though it may seem to be splitting hairs at first glance. 这是一个真实的问题,尽管乍看起来似乎是在分裂头发。 Basically I want to treat a series as a column rather than a row, which I think makes intuitive sense even if series can not technically be divided into rows and columns (?) whereas 1d numpy arrays can. 基本上,我想将序列视为列而不是行,即使在技术上不能将序列划分为行和列(?),而一维numpy数组也可以,但我认为这是直觉的。 The example: 这个例子:

df = pd.DataFrame( { 'a' : [5,3,1],
                     'b' : [4,6,2],
                     'c' : [2,4,9] } )

df['rowsum'] = df.sum(1)

In [31]: df
Out[31]: 
   a  b  c  rowsum
0  5  4  2      11
1  3  6  4      13
2  1  2  9      12

I just want to get percentages by row (so rows sum to 1). 我只想按行获取百分比(因此行的总和为1)。 I would like to do this: 我想这样做:

df.iloc[:,0:3] / df.rowsum

which works fine in numpy (with reshape) since you can make rowsum a column or row vector. 由于您可以将rowum设为列或行向量,因此在numpy(具有重塑形状)中可以很好地工作。 But here I can't reshape the series or use T on df.rowsum. 但是在这里,我无法重塑序列或在df.rowsum上使用T。 It seems a dataframe can be transposed but not a series. 似乎可以转置数据帧,但不能转置一系列。 The following works (along with several other solutions). 以下工作(以及其他几种解决方案)。 And it can be coded naturally in numpy, but that involves converting to arrays and then back to a dataframe. 它可以自然地以numpy编码,但这涉及到转换为数组,然后再转换为数据帧。

In [32]: ( df.iloc[:,0:3].T / df.rowsum ).T
Out[32]: 
          a         b         c
0  0.454545  0.363636  0.181818
1  0.230769  0.461538  0.307692
2  0.083333  0.166667  0.750000

I'm sorry if this seems trivial but it's valuable to be able to code in terms of rows and columns in an intuitive way. 很抱歉,这似乎微不足道,但是能够以直观的方式按行和列进行编码非常有价值。 So the question is merely: can I make a series act like a column vector rather than a row vector? 因此,问题仅仅是:我可以使序列的行为像列向量,而不是行向量吗?

Also it seems inconsistent that this will work fine on a column. 同样,这在列上也可以正常工作似乎不一致。

df.iloc[:,0] / df.rowsum

pandas appears in this case to be dividing (elementwise) two column arrays (on account of the display, even if the row/column distinction is artificial). 在这种情况下,大熊猫似乎是在划分(按元素划分)两个列数组(出于显示的考虑,即使行/列的区分是人为的)。 But when the first part of that expression is changed from a dataframe to series, it seems to effectively go from being a 3x1 to a 1x2. 但是,当该表达式的第一部分从数据帧更改为系列时,它似乎实际上是从3x1变为1x2。 It's like going from a series to a dataframe is an implicit transform operation? 这就像从系列到数据帧是隐式转换操作吗?

Maybe a better way to think about it: 也许可以考虑一下:

all( dist.iloc[:,:10].index == dist.rowsum.index )
Out[1526]: True

The indexes line up here, why does pandas seem to employ the index differently for series/series broadcasting than for dataframe/series broadcasting? 索引在这里排列,为什么熊猫似乎在系列/系列广播中使用索引而不是在数据帧/系列广播中使用索引? Or am I just thinking about this completely wrong?!? 还是我只是在想这是完全错误的?!

try this 尝试这个

df.apply(lambda x:x/x[3], axis = 1)

        a          b           c    rowsum
0   0.454545    0.363636    0.181818    1
1   0.230769    0.461538    0.307692    1
2   0.083333    0.166667    0.750000    1

If you don't need the rowsum column, you can use 如果不需要rowsum列,则可以使用

df.apply(lambda x:x/sum(x), axis = 1) #with your original dataFrame

Try 尝试

df.iloc[:, 0:3].div(df.rowsum, axis=0)

to see if it's what you want. 看看是否是您想要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM