[英]Python list or pandas dataframe arbitrary indexing and slicing
I have used both R and Python extensively in my work, and at times I get the syntax between them confused. 我在工作中广泛使用了R和Python,有时我混淆了它们之间的语法。
In R, if I wanted to create a model from only some features of my data set, I can do something like this: 在R中,如果只想从数据集的某些功能中创建模型,则可以执行以下操作:
subset = df[1:1000, c(1,5,14:18,24)]
This would take the first 1000 rows (yes, R starts on index 1), and it would take the 1st, 5th, 14th through 18th, and 24th columns. 这将取前1000行(是的,R开始于指数1),并且将采取1号,5号,14号到 18号和第24列。
I have tried to do any combination of slice
, range
, and similar sorts of functions, and have not been able to duplicate this sort of flexibility. 我尝试将
slice
, range
和类似类型的功能进行任何组合,并且无法复制这种灵活性。 In the end, I just enumerated all of the values. 最后,我只列举了所有值。
How can this be done in Python? 如何在Python中完成?
Pick an arbitrary subset of elements from a list, some of which are selected individually (as in the commas shown above) and some selected sequentially (as in the colons shown above)?
从列表中选择元素的任意子集,其中某些元素是单独选择的(如上面的逗号所示),而某些元素是依次选择的(如上面的冒号所示)?
In a file of index_tricks
, numpy
defines a class instance that converts a scalars and slices into an enumerated list, using the r_
method: 在
index_tricks
文件中, numpy
定义了一个类实例,该实例使用r_
方法将标量和切片转换为枚举列表:
In [560]: np.r_[1,5,14:18,24]
Out[560]: array([ 1, 5, 14, 15, 16, 17, 24])
It's an instance with a __getitem__
method, so it uses the indexing syntax. 这是带有
__getitem__
方法的实例,因此它使用索引语法。 It expands 14:18
into np.arange(14,18)
. 它将
14:18
扩展为np.arange(14,18)
。 It can also expand values with linspace
. 它还可以使用
linspace
扩展值。
So I think you'd rewrite 所以我想你会重写
subset = df[1:1000, c(1,5,14:18,24)]
as 如
df.iloc[:1000, np.r_[0,4,13:17,23]]
You can use iloc
for integer indexing in pandas: 您可以将
iloc
用于熊猫中的整数索引:
df.iloc[0:10000, [0, 4] + range(13,18) + [23]]
As commented by @root, in Python 3, you need to explicitly convert range()
to list by df.iloc[0:10000, [0, 4] + list(range(13,18)) + [23]]
正如@root所评论的那样,在Python 3中,您需要通过
df.iloc[0:10000, [0, 4] + list(range(13,18)) + [23]]
将range()
显式转换为列表。
Try this, The first square brackets filter. 试试这个,第一个方括号过滤器。 The second set of square brackets slice.
第二套方括号切片。
df[[0,4]+ range(13,18)+[23]][:1000]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.