简体   繁体   中英

is it possible to recognize unique values ​within the keys in the dictionaries?

I have

df.shape
> (12702, 27)

df['x'][0]
>{'a': '123',
  'b': '214', 
  'c': '654',}

I try:

df['x'].unique()
>TypeError: unhashable type: 'list'

is it possible to recognize unique values within the keys in the dictionaries?

or

should i use dummies?

(Providing an answer similar to https://stackoverflow.com/a/12897477/15744261 )

Edited:

Sorry for the multiple edits, I misunderstood your question.

From your snippet it looks like your df[x] is returning a list of dictionaries. If what you're asking is to get all unique values across some of the dictionaries, you can add the keys using list(my_dict) (which will return a list of the keys). Then use a set on the list to return the unique values. Example:

values = set(list(df['x'][0]) + list(df['x'][1]) + ... )

If you need unique keys across all of these dictionaries, you could get a little more creative with list comprehension to compile all the keys and then wrap that in a set for the unique values.

Old answer:

For a list you can simply convert to a set which will remove the duplicates:

values = set(df['x'][0])

If you want to use these values as a list you can convert that set into a list as well:

list_values = list(values)

Or in one line:

values = list(set(df['x'][0]))

Keep in mind, this is certainly not the most efficient way to do this. I'm sure there are better ways to do it if you're dealing with a large amount of data.

It seems that you want to find the unique keys across all the dictionaries in this column. This can be done easily with functools.reduce . I've generated some sample data:

import pandas as pd
import random

possible_keys = 'abcdefg'

df = pd.DataFrame({'x': [{key: 1 for key in random.choices(possible_keys, k=3)} for _ in range(10)]})

This dataframe looks like this:

                          x
0          {'c': 1, 'a': 1}
1  {'b': 1, 'd': 1, 'c': 1}
2          {'d': 1, 'b': 1}
3  {'b': 1, 'f': 1, 'e': 1}
4  {'a': 1, 'd': 1, 'c': 1}
5          {'g': 1, 'b': 1}
6                  {'d': 1}
7                  {'e': 1}
8  {'c': 1, 'd': 1, 'f': 1}
9  {'b': 1, 'a': 1, 'f': 1}

Now the actual meat of the answer:

from functools import reduce

reduce(set.union, df['x'], set())

Results in:

{'a', 'b', 'c', 'd', 'e', 'f', 'g'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM