简体   繁体   中英

Categorical column with list value in dataframe

I have a Dataframe storing different types of columns, float/int/object .

Since, the Dataset is too large I am looking for ways to reduce memory usage.

I found "Categorical" can be adopted to reduce the memory usage on "Object" type columns and I applied this on them. But once I, change a column with list value, error "TypeError: unhashable type: 'list'"

Here is my dataframe

vs_df = pd.DataFrame({'plan_name': ['abc', 'def'], 'plan_id': [10001, 10002]})
vs_df['handled_plans_id'] = np.empty((len(vs_df), 0)).tolist()
vs_df.at[[0, 1], 'handled_plans_id'] = [[105,120], []]
vs_df.handled_plans_id = vs_df.handled_plans_id.astype('category') # Error here

print(vs_df)

     plan_id plan_name handled_plans_id
0    10001       abc       [105, 120]
1    10002       def               []

Error:

TypeError: unhashable type: 'list'
File "pandas\_libs\hashtable_class_helper.pxi", line 1367, in pandas._libs.hashtable.PyObjectHashTable.get_labels

Any methods solving this or reduce the size of this column with list values are appreciate. Thanks!

Update

Lots of values inside handled_plans_id column are different. Would like to see any methods to reduce memory usage on this column.

Using tuples.

vs_df = pd.DataFrame({'plan_name': ['abc', 'def'], 'plan_id': [10001, 10002]})
vs_df['handled_plans_id'] = [()]*len(vs_df)
vs_df.at[[0, 1], 'handled_plans_id'] = [(105,120), ()]
vs_df.handled_plans_id = vs_df.handled_plans_id.astype('category')
print(vs_df)

   plan_id plan_name handled_plans_id
0    10001       abc       (105, 120)
1    10002       def               ()

If the tuples will be a known maximum length, you could split them into columns. Categorical should help more then, although not if you have too many distinct numbers. Categorical data is usually one of a small set--an enumeration , like 'heart', 'spades', 'diamonds', 'clubs', that sort of thing. If you have too many distinct values, converting to a category won't help much.


If the file is too large to fit in memory, you can chunk it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM