简体   繁体   中英

Sliced column of Pandas dataframe keep mentioning original column name in new objects created from the column

I sliced from a pandas dataframe to create object label. The name of the column in the original dataframe was y .

Now when I take sum of label and assign it to m , while printing it keeps showing y . Why is it doing so and what is it trying to mean by writing y 50.0 ?

>>> type(label)
<class 'pandas.core.frame.DataFrame'>
>>> label.head(2)
     y
0  1.0
1  1.0
>>> m = label.sum()
>>> m
y    50.0
dtype: float64
>>> 

Your label DataFrame contains only 1 column named y with 50 rows of 1.0 , so it returned sum of y . In your code the name became the index name (a sum of a single column) since all index in DataFrame needs a name, you can rename that using m.index = <insert a name or int here> , but m.index = None will raise TypeError exception.

>>> import pandas as pd
>>> import numpy as np

>>> df = pd.DataFrame(np.ones(50), columns=['y'])
>>> df.head(2)
     y
0  1.0
1  1.0
>>> df
      y
0   1.0
1   1.0
2   1.0
3   1.0
4   1.0
... # reducted
48  1.0
49  1.0
>>> df.sum()
y    50.0
dtype: float64

>>> m = df.sum()
>>> m
y    50.0
dtype: float64
>>> m.index
Index(['y'], dtype='object')
>>> m.index = None
Traceback (most recent call last):
 ...
TypeError: Index(...) must be called with a collection of some kind, None was passed

You might be expecting m as float . No, m is a Series .

>>> type(m) # to know type of `m`
pandas.core.series.Series

>>> m.dtype # to know type of data contained in `m`
dtype('float64')

Doing DataFrame.sum() will generally return a Series (or Dataframe in some cases). See docs .

That's why when you printed m you didn't get only the number 50.0 , rather you got the Series m with y as the index, and 50.0 as the value.

Use label['y'].sum()

label is a pd.DataFrame object, and pd.DataFrame.sum is different to pd.Series.sum . "Summing a dataframe" with no arguments means summing over all indices for each column . For this, if you want to be explicit, you could use axis=0 , but this is not required:

sums_by_col = label.sum(axis=0)

But what you really want is pd.Series.sum :

sum_of_series = label['y'].sum()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM