简体   繁体   English

带有相应numpy数组的子集熊猫数据框

[英]subset pandas dataframe with corresponding numpy array

I have a pandas dataframe with following columns. 我有以下几列的pandas数据框。

    order_id latitude
0       519  19.119677
1       519  19.119677
2       520  19.042117
3       520  19.042117
4       520  19.042117
5       521  19.138245
6       523  19.117662
7       523  19.117662
8       523  19.117662
9       523  19.117662
10      523  19.117662
11      524  19.137793
12      525  19.119372
13      526   0.000000
14      526   0.000000
15      526   0.000000
16      527  19.133430
17      528   0.000000
18      529  19.118284
19      530   0.000000
20      531  19.114269
21      531  19.114269
22      532  19.136292
23      533  19.119075
24      533  19.119075
25      533  19.119075
26      534  19.119677
27      535  19.119677
28      535  19.119677
29      535  19.119677

order_id is repeated, I want unique order_id values which I can get by 重复order_id,我想要可以通过的唯一order_id值

unique_order_id = pd.unique(tsp_data['order_id'])

array(['519', '520', '521', '523', '524', '525', '526', '527', '528',
   '529', '530', '531', '532', '533', '534', '535'], dtype=object)

Which returns me correct unique values. 返回正确的唯一值。 I am storing it in unique_order_id variable. 我将其存储在unique_order_id变量中。 Now I want only corresponding lat values for unique order_id values. 现在,我只想为唯一的order_id值使用相应的经度值。

I am doing something like this. 我正在做这样的事情。

tsp_data['latitude'][tsp_data['order_id'].isin(unique_order_id)]

But it returns me all 30 rows. 但是它返回了我所有的30行。 Where I am getting wrong? 我哪里出错了? please help 请帮忙

You could use pd.pivot_table which will return first values by order_id : 您可以使用pd.pivot_table ,它将通过order_id返回第一个值:

In [184]: tsp_data.pivot_table(index='order_id', values='latitude')
Out[184]:
order_id
519    19.119677
520    19.042117
521    19.138245
523    19.117662
524    19.137793
525    19.119372
526     0.000000
527    19.133430
528     0.000000
529    19.118284
530     0.000000
531    19.114269
532    19.136292
533    19.119075
534    19.119677
535    19.119677
Name: latitude, dtype: float64

Or you could use drop_duplicates : 或者您可以使用drop_duplicates

In [185]: tsp_data.drop_duplicates(subset=['order_id'])
Out[185]:
    order_id   latitude
0        519  19.119677
2        520  19.042117
5        521  19.138245
6        523  19.117662
11       524  19.137793
12       525  19.119372
13       526   0.000000
16       527  19.133430
17       528   0.000000
18       529  19.118284
19       530   0.000000
20       531  19.114269
22       532  19.136292
23       533  19.119075
26       534  19.119677
27       535  19.119677

Or groupby as @EdChum suggested 或@EdChum建议的groupby

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM