简体   繁体   中英

Pandas vs. Numpy Dataframes

Look at these few lines of code:

df2 = df.copy()
df2[1:] = df[1:]/df[:-1].values -1
df2.ix[0, :] = 0

Our instructor said we need to use the .values attribute to access the underlying numpy array, otherwise, our code wouldn't work.

I understand that a pandas DataFrame does have an underlying representation as a numpy array, but I didn't understand why we cannot operate directly on the pandas DataFrame using just slicing.

May you elucidate me about that?

pandas focuses on tabular data structures and when doing the operations (addition, subtraction etc.) it looks at the labels - not positions.

Consider the following DataFrame:

df = pd.DataFrame(np.random.randn(5, 3), index=list('abcde'), columns=list('xyz'))

Here, df[1:] is:

df[1:]
Out: 
          x         y         z
b  1.003035  0.172960  1.160033
c  0.117608 -1.114294 -0.557413
d -1.312315  1.171520 -1.034012
e -0.380719 -0.422896  1.073535

And df[:-1] is:

df[:-1]
Out: 
          x         y         z
a  1.367916  1.087607 -0.625777
b  1.003035  0.172960  1.160033
c  0.117608 -1.114294 -0.557413
d -1.312315  1.171520 -1.034012

If you do df[1:] / df[:-1] it will divide row b 's by row b 's, row c 's by row c 's and row d 's by row d 's. For row a and e , it will not be able to find corresponding rows in the other DataFrame (either in the first one or in the second one) so it will return nan :

df[1:] / df[:-1]
Out: 
     x    y    z
a  NaN  NaN  NaN
b  1.0  1.0  1.0
c  1.0  1.0  1.0
d  1.0  1.0  1.0
e  NaN  NaN  NaN

If you just want to do element-wise division ignoring the labels, accessing the underlying numpy array by .values for one of the frames is a way of telling pandas to ignore labels. Since numpy arrays don't have labels, pandas will just do element-wise operations:

df[1:]/df[:-1].values
Out: 
           x         y         z
b   0.733258  0.159028 -1.853749
c   0.117252 -6.442482 -0.480515
d -11.158359 -1.051357  1.855018
e   0.290112 -0.360981 -1.038223

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM