I have
df.shape
> (12702, 27)
df['x'][0]
>{'a': '123',
'b': '214',
'c': '654',}
I try:
df['x'].unique()
>TypeError: unhashable type: 'list'
is it possible to recognize unique values within the keys in the dictionaries?
or
should i use dummies?
(Providing an answer similar to https://stackoverflow.com/a/12897477/15744261 )
Edited:
Sorry for the multiple edits, I misunderstood your question.
From your snippet it looks like your df[x]
is returning a list of dictionaries. If what you're asking is to get all unique values across some of the dictionaries, you can add the keys using list(my_dict)
(which will return a list of the keys). Then use a set on the list to return the unique values. Example:
values = set(list(df['x'][0]) + list(df['x'][1]) + ... )
If you need unique keys across all of these dictionaries, you could get a little more creative with list comprehension to compile all the keys and then wrap that in a set for the unique values.
Old answer:
For a list you can simply convert to a set which will remove the duplicates:
values = set(df['x'][0])
If you want to use these values as a list you can convert that set into a list as well:
list_values = list(values)
Or in one line:
values = list(set(df['x'][0]))
Keep in mind, this is certainly not the most efficient way to do this. I'm sure there are better ways to do it if you're dealing with a large amount of data.
It seems that you want to find the unique keys across all the dictionaries in this column. This can be done easily with functools.reduce
. I've generated some sample data:
import pandas as pd
import random
possible_keys = 'abcdefg'
df = pd.DataFrame({'x': [{key: 1 for key in random.choices(possible_keys, k=3)} for _ in range(10)]})
This dataframe looks like this:
x
0 {'c': 1, 'a': 1}
1 {'b': 1, 'd': 1, 'c': 1}
2 {'d': 1, 'b': 1}
3 {'b': 1, 'f': 1, 'e': 1}
4 {'a': 1, 'd': 1, 'c': 1}
5 {'g': 1, 'b': 1}
6 {'d': 1}
7 {'e': 1}
8 {'c': 1, 'd': 1, 'f': 1}
9 {'b': 1, 'a': 1, 'f': 1}
Now the actual meat of the answer:
from functools import reduce
reduce(set.union, df['x'], set())
Results in:
{'a', 'b', 'c', 'd', 'e', 'f', 'g'}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.