简体   繁体   English

子集 DataFrame 列 Numpy Pandas 中的数组

[英]Subset DataFrame Columns Numpy Array in Pandas

I'm trying to subset data in a pandas dataframe based on values that exist in a separate array.我正在尝试根据单独数组中存在的值对 pandas dataframe 中的数据进行子集化。 Below is a sample example that does work and illustrates what I'm trying to do:下面是一个有效的示例示例,说明了我正在尝试做的事情:

import pandas as pd
import numpy as np
mysubset = np.array([1,2,3,4])
d = {'col1': [1, 2, 3, 4, 5, 6], 'col2': [3, 4, 1, 3, 5, 5]}
df = pd.DataFrame(data=d)
df[df['col1'].isin(mysubset)]

Using that working code as a prototype, I'm implementing (what I think is) the same process on my actual real data, but it doesn't work.使用该工作代码作为原型,我正在对我的实际真实数据实施(我认为是)相同的过程,但它不起作用。 My real data look like我的真实数据看起来像

>>> tmp.head()
   ItemID                  P0
44  26785         0.276844507
61  26534  1.4108438640000001
71  14107  1.0652574239999999
86  26530  1.1059459039999999
93  18142         0.903011679

and the array I want to use for subsetting is我想用于子集的数组是

>>> op_items
array([18692, 18694, 18696, 18706, 18711, 18714, 18716, 18722, 19332,
       19333, 26526, 26527, 26530, 26532, 26533, 26534, 26535, 26536,
       26538, 26541, 14107, 14110, 14120, 14149, 14165, 17984, 18004,
       18005, 18006, 18007, 18008, 18134, 18136, 18139, 18141, 18142,
       19081, 19084, 19086, 20789, 20794, 20796, 20800, 20802, 26784,
       26785, 26786, 26787], dtype=int64) 

Using this as in the toy example above gives在上面的玩具示例中使用它会给出

>>> tmp[tmp['ItemID'].isin(op_items)]
Empty DataFrame
Columns: [ItemID, P0]
Index: []

But, manually grabbing some elements from within a list does work:但是,手动从列表中抓取一些元素确实有效:

>>> tmp[tmp['ItemID'].isin(['18692', '18696'])]
    ItemID           P0
236  18696  0.566035305
624  18692   0.60981902

Using the following confirms they are of the same form as in the toy example使用以下内容确认它们与玩具示例中的形式相同

>>> type(op_items)
<class 'numpy.ndarray'>
>>> type(tmp['ItemID'])
<class 'pandas.core.series.Series'>

So, I am uncertain what other mistake I am making and could use a pointer.所以,我不确定我犯了什么其他错误并且可以使用指针。 I realize in the example where I hardcoded and grabbed I cast the values in a list.我意识到在我硬编码和抓取的示例中,我将值转换为列表。 But, the toy example above uses the isin feature where mysubset is an array similar to op_items .但是,上面的玩具示例使用了isin功能,其中mysubset是一个类似于op_items的数组。

Thank you My question differs from this one in that I'm not worried about duplicates, subset pandas dataframe with corresponding numpy array .谢谢我的问题与这个不同,因为我不担心重复, 子集 pandas dataframe 和相应的 numpy 数组

Your op_items is an array of integers, whereas your tmp['ItemID'] is string type.您的op_items是一个整数数组,而您的tmp['ItemID']是字符串类型。 Use:利用:

tmp['ItemID'] = tmp['ItemID'].astype('Int64')

tmp[tmp['ItemID'].isin(op_items)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM