简体   繁体   中英

Pandas data slicing by column names

I am learning Pandas and trying to understand slicing. Everything makes sense expect when I try to slice using column names. My data frame looks like this:

              area       pop
California  423967  38332521
Florida     170312  19552860
Illinois    149995  12882135
New York    141297  19651127
Texas       695662  26448193

and when I do data['area':'pop'] I expected both columns to show since I am using explicit index and both the start and end of the slice should be inclusive, but the result is an empty dataframe.

I also get an empty dataframe for data['area':] . Why is this different from slicing with explicit indexes elsewhere?

According to documentation

With DataFrame, slicing inside of [] slices the rows . This is provided largely as a convenience since it is such a common operation.

You get an empty DataFrame because your index contains strings and it can't find values 'area' and 'pop' there. Here what you get in case of numeric index

>> data.reset_index()['area':'pop']
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [area] of <class 'str'>

What you want instead is

>> data.loc[:, 'area':'pop']

If you want to get the 2 columns use:

import pandas as pd

#data = pd.read_csv('data.csv', header = True)

all = data[['area','pop']]

So you can pass a list of columns to [] to select columns in that order.

Similarily, to get only the area column use:

area = df[['area']]

Now, if you want to get the values of the columns use:

all = data[['area','pop']].values
area = df[['area']].values

The all and area are going to be numpy arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM