简体   繁体   中英

Can a pandas series be a column rather than a row?

This is a real question, though it may seem to be splitting hairs at first glance. Basically I want to treat a series as a column rather than a row, which I think makes intuitive sense even if series can not technically be divided into rows and columns (?) whereas 1d numpy arrays can. The example:

df = pd.DataFrame( { 'a' : [5,3,1],
                     'b' : [4,6,2],
                     'c' : [2,4,9] } )

df['rowsum'] = df.sum(1)

In [31]: df
Out[31]: 
   a  b  c  rowsum
0  5  4  2      11
1  3  6  4      13
2  1  2  9      12

I just want to get percentages by row (so rows sum to 1). I would like to do this:

df.iloc[:,0:3] / df.rowsum

which works fine in numpy (with reshape) since you can make rowsum a column or row vector. But here I can't reshape the series or use T on df.rowsum. It seems a dataframe can be transposed but not a series. The following works (along with several other solutions). And it can be coded naturally in numpy, but that involves converting to arrays and then back to a dataframe.

In [32]: ( df.iloc[:,0:3].T / df.rowsum ).T
Out[32]: 
          a         b         c
0  0.454545  0.363636  0.181818
1  0.230769  0.461538  0.307692
2  0.083333  0.166667  0.750000

I'm sorry if this seems trivial but it's valuable to be able to code in terms of rows and columns in an intuitive way. So the question is merely: can I make a series act like a column vector rather than a row vector?

Also it seems inconsistent that this will work fine on a column.

df.iloc[:,0] / df.rowsum

pandas appears in this case to be dividing (elementwise) two column arrays (on account of the display, even if the row/column distinction is artificial). But when the first part of that expression is changed from a dataframe to series, it seems to effectively go from being a 3x1 to a 1x2. It's like going from a series to a dataframe is an implicit transform operation?

Maybe a better way to think about it:

all( dist.iloc[:,:10].index == dist.rowsum.index )
Out[1526]: True

The indexes line up here, why does pandas seem to employ the index differently for series/series broadcasting than for dataframe/series broadcasting? Or am I just thinking about this completely wrong?!?

try this

df.apply(lambda x:x/x[3], axis = 1)

        a          b           c    rowsum
0   0.454545    0.363636    0.181818    1
1   0.230769    0.461538    0.307692    1
2   0.083333    0.166667    0.750000    1

If you don't need the rowsum column, you can use

df.apply(lambda x:x/sum(x), axis = 1) #with your original dataFrame

Try

df.iloc[:, 0:3].div(df.rowsum, axis=0)

to see if it's what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM