简体   繁体   English

按名称列出的Pandas数据切片

[英]Pandas data slicing by column names

I am learning Pandas and trying to understand slicing. 我正在学习熊猫并试图理解切片。 Everything makes sense expect when I try to slice using column names. 当我尝试使用列名切片时,一切都有意义。 My data frame looks like this: 我的数据框如下所示:

              area       pop
California  423967  38332521
Florida     170312  19552860
Illinois    149995  12882135
New York    141297  19651127
Texas       695662  26448193

and when I do data['area':'pop'] I expected both columns to show since I am using explicit index and both the start and end of the slice should be inclusive, but the result is an empty dataframe. 当我执行data['area':'pop'] ,由于我使用的是显式索引,因此我希望显示两列,并且切片的开头和结尾都应该是包含的,但结果是空数据帧。

I also get an empty dataframe for data['area':] . 我还获得了data['area':]的空数据帧。 Why is this different from slicing with explicit indexes elsewhere? 为什么这与其他地方的显式索引切片不同?

According to documentation 根据文件

With DataFrame, slicing inside of [] slices the rows . 使用DataFrame,在[]内部切片会对行进行切片 This is provided largely as a convenience since it is such a common operation. 这主要是为了方便而提供的,因为它是如此常见的操作。

You get an empty DataFrame because your index contains strings and it can't find values 'area' and 'pop' there. 你得到一个空的DataFrame,因为你的索引包含字符串,它找不到值'area'和'pop'。 Here what you get in case of numeric index 这里是你得到的数字索引

>> data.reset_index()['area':'pop']
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [area] of <class 'str'>

What you want instead is 你想要的是

>> data.loc[:, 'area':'pop']

If you want to get the 2 columns use: 如果你想获得2列使用:

import pandas as pd

#data = pd.read_csv('data.csv', header = True)

all = data[['area','pop']]

So you can pass a list of columns to [] to select columns in that order. 因此,您可以将列列表传递给[]以按该顺序选择列。

Similarily, to get only the area column use: 类似地,只使用区域列:

area = df[['area']]

Now, if you want to get the values of the columns use: 现在,如果要获取列的值,请使用:

all = data[['area','pop']].values
area = df[['area']].values

The all and area are going to be numpy arrays. allarea将是numpy数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM