简体   繁体   中英

Pandas pd.series returns a data frame

I want to ask a question about Panda's series.

I am reading a book on Python on Data Science by O'Reilly publications and was reading on Pandas.

Consider the following code:

frame = pd.DataFrame(np.random.randn(4,3), columns=list('bde'),
                index=['Utah', 'Ohio', 'Texas', 'Oregon'])

This code provides the following result.

        b        d         e
Utah    -0.81    0.43      -0.50
Ohio    1.67     -0.67     1.30
Texas   0.53     -0.32     0.80
Oregon  0.25     0.91      0.70    

All values were manually expressed to 3 dp for convenience on SO.

Now, I learnt that functions can also return Series with multiple values:

def f(x):
    return pd.Series([x.min(), x.max()], index=['min', 'max'])

as the literature states:

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary. The function passed to apply need not return a scalar value; it can also return a Series with multiple values.

and running the following code

frame.apply(f)

produces the following result:

        b        d        e
min     -0.82    -0.67    -0.50
max     1.67     0.91     1.30

This code works.

However, I'm confused here.

I thought that a series should only be one dimensional ie pseudo-one-column data structures only with only indexes for each corresponding element.

eg

 >>> s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

 >>> s

a    0.469112
b   -0.282863
c   -1.509059
d   -1.135632
e    1.212112
dtype: float64

However, the result of the function appears to be a series of a two-dimensional nature, which doesn't make sense to me.

How has the function appeared to make a series of a two-dimensional nature?

Interestingly, doing

type(frame.applymap(format))

returns

pandas.core.frame.DataFrame

which I don't know why it does.

A Pandas Series is a 1D array of some type. A DataFrame is a 2D array where each column is a Series and they can have different types.

However, the part you are probably missing is that the "type" can be the generic Python object which is a reference to any object. For example:

pd.Series([[1,2],[3,4]])

Gives you:

0    [1, 2]
1    [3, 4]
dtype: object

That is a 1D array of Python lists (which do not even have to have uniform length).

Using object dtype in Pandas (or NumPy) is usually suboptimal and should be avoided where possible. In the above example you can just replace Series with DataFrame for a more optimal representation. The object dtype is suboptimal because Pandas does not natively understand most of it, so any operations have to be done using the Python interpreter on each value in the array, rather than accelerated by compiled code as would be the case if the dtype were int or some other type Pandas natively understands.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM