Pandas Data Frame find index according to column value

Question

I have a data frame lets say "df". Now one of the columns of the data frame is named "itemID". I would like to get some how very fast the row index according to a value on the column "itemID".

When I do:

df[df['itemID']==X]

The performance is quite slow.

Is there a way to create something like a hash-index in order to do the above?

Answer 1

I believe you can use dask .

Docs say:

The following class of computations works well:

Trivially parallelizable operations (fast):

Row-wise selections: df[df.x > 0]

You can also check how Create Dask DataFrames .

Example

import pandas as pd
import dask.dataframe as dd

df = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'itemID': [1,2,4,4]})

print (df)
    A  itemID
0  A0       1
1  A1       2
2  A2       4
3  A3       4

#Construct a dask objects from a pandas objects
df_dask = dd.from_pandas(df, npartitions=3)

#Row-wise selections
print (df_dask[df_dask.itemID == 4].compute())
    A  itemID
2  A2       4
3  A3       4

Pandas Data Frame find index according to column value

Question

1 answers

solution1
1 2016-06-06 13:29:41

Pandas Data Frame find index according to column value

Question

1 answers

solution1 1 2016-06-06 13:29:41

solution1
1 2016-06-06 13:29:41