简体   繁体   English

python list或pandas dataframe任意索引和切片

[英]Python list or pandas dataframe arbitrary indexing and slicing

I have used both R and Python extensively in my work, and at times I get the syntax between them confused. 我在工作中广泛使用了R和Python,有时我混淆了它们之间的语法。

In R, if I wanted to create a model from only some features of my data set, I can do something like this: 在R中,如果只想从数据集的某些功能中创建模型,则可以执行以下操作:

subset = df[1:1000, c(1,5,14:18,24)]

This would take the first 1000 rows (yes, R starts on index 1), and it would take the 1st, 5th, 14th through 18th, and 24th columns. 这将取前1000行(是的,R开始于指数1),并且将采取1号,5号,14号 18号和第24列。

I have tried to do any combination of slice , range , and similar sorts of functions, and have not been able to duplicate this sort of flexibility. 我尝试将slicerange和类似类型的功能进行任何组合,并且无法复制这种灵活性。 In the end, I just enumerated all of the values. 最后,我只列举了所有值。

How can this be done in Python? 如何在Python中完成?

Pick an arbitrary subset of elements from a list, some of which are selected individually (as in the commas shown above) and some selected sequentially (as in the colons shown above)? 从列表中选择元素的任意子集,其中某些元素是单独选择的(如上面的逗号所示),而某些元素是依次选择的(如上面的冒号所示)?

In a file of index_tricks , numpy defines a class instance that converts a scalars and slices into an enumerated list, using the r_ method: index_tricks文件中, numpy定义了一个类实例,该实例使用r_方法将标量和切片转换为枚举列表:

In [560]: np.r_[1,5,14:18,24]
Out[560]: array([ 1,  5, 14, 15, 16, 17, 24])

It's an instance with a __getitem__ method, so it uses the indexing syntax. 这是带有__getitem__方法的实例,因此它使用索引语法。 It expands 14:18 into np.arange(14,18) . 它将14:18扩展为np.arange(14,18) It can also expand values with linspace . 它还可以使用linspace扩展值。

So I think you'd rewrite 所以我想你会重写

subset = df[1:1000, c(1,5,14:18,24)]

as

df.iloc[:1000, np.r_[0,4,13:17,23]]

You can use iloc for integer indexing in pandas: 您可以将iloc用于熊猫中的整数索引:

df.iloc[0:10000, [0, 4] + range(13,18) + [23]]

As commented by @root, in Python 3, you need to explicitly convert range() to list by df.iloc[0:10000, [0, 4] + list(range(13,18)) + [23]] 正如@root所评论的那样,在Python 3中,您需要通过df.iloc[0:10000, [0, 4] + list(range(13,18)) + [23]]range()显式转换为列表。

Try this, The first square brackets filter. 试试这个,第一个方括号过滤器。 The second set of square brackets slice. 第二套方括号切片。

df[[0,4]+ range(13,18)+[23]][:1000]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM