有列表時如何獲取 dataframe 列的唯一值 - python

Question

我有以下 dataframe 我想在其中打印color列的唯一值。

df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow , red'], 'orange'], 'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

Output:
           colors   names
0           green   Terry
1           green     Nor
2          purple  Franck
3  [yellow , red]    Pete
4          orange   Agnes

如果沒有[yellow, red]行， df.colors.unique()可以正常工作。 因為我不斷收到TypeError: unhashable type: 'list'錯誤，這是可以理解的。

有沒有辦法在不考慮這一行的情況下仍然獲得唯一值？

我嘗試了以下但沒有奏效：

df = df[~df.colors.str.contains(',', na=False)] # Nothing happens
df = df[~df.colors.str.contains('[', na=False)] # Output: error: unterminated character set at position 0
df = df[~df.colors.str.contains(']', na=False)] # Nothing happens

Answer 1

如果值是列表，則通過isinstance方法檢查：

#changed sample data
df = pd.DataFrame({'colors': ['green', 'green', 'purple', ['yellow' , 'red'], 'orange'], 
                   'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']})

df = df[~df.colors.map(lambda x : isinstance(x, list))]
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

您的解決方案應該通過轉換為字符串和regex=False參數來更改：

df = df[~df.colors.astype(str).str.contains('[', na=False, regex=False)] 
print (df)
   colors   names
0   green   Terry
1   green     Nor
2  purple  Franck
4  orange   Agnes

此外，如果想要所有唯一值包括 pandas 0.25+ 的列表：

s = df.colors.map(lambda x : x if isinstance(x, list) else [x]).explode().unique().tolist()
print (s)
['green', 'purple', 'yellow', 'red', 'orange']

Answer 2

讓我們使用type

df.colors.apply(lambda x : type(x)!=list)
0     True
1     True
2     True
3    False
4     True
Name: colors, dtype: bool

Answer 3

更改輸入樣本

指定的輸入有一個字符串，它是一個列表（由海報指定），因此轉換為字符串列表。

# Required Import
from ast import literal_eval

df = pd.DataFrame({
    'colors': ['green', 'green', 'purple', "['yellow' , 'red']", 'orange'], 
    'names': ['Terry', 'Nor', 'Franck', 'Pete', 'Agnes']
})

執行literal_eval。欲了解更多信息，請查看literal_eval

文字 eval 以便僅在存在列表作為字符串的情況下將字符串轉換為實際列表

list_records = df.colors.str.contains('[', na=False, regex=False)
df.loc[list_records, 'colors'] = df.loc[list_records, 'colors'].apply(literal_eval)

獨特的 Colors

適用於 pandas >= 0.25

df.explode('colors')['colors'].unique()

給

['green', 'purple', 'yellow', 'red', 'orange']

Answer 4

假設 dataframe 中的每個值都很重要，這是我經常用來“解壓列表”的一種技術：

import re

def unlock_list_from_string(string, delim=','):
    """
    lists are stored as strings (in csv files) ex. '[1,2,3]'
    this function unlocks that list
    """
    if type(string)!=str:
        return string

    # remove brackets
    clean_string = re.sub('\[|\]', '', string)
    unlocked_string = clean_string.split(delim)
    unlocked_list = [x.strip() for x in unlocked_string]
    return unlocked_list

all_colors_nested = df['colors'].apply(unlock_list_from_string)
# unnest
all_colors = [x for y in all_colors_nested for x in y ]

print(all_colors)
# ['green', 'green', 'purple', 'yellow', 'red', 'orange']

有列表時如何獲取 dataframe 列的唯一值 - python

問題描述

4 個解決方案

解決方案1
3 已采納 2019-10-17 13:52:08

解決方案2
2 2019-10-17 13:49:27

解決方案3
1 2019-10-17 14:08:22

更改輸入樣本

執行literal_eval。欲了解更多信息，請查看literal_eval

獨特的 Colors

解決方案4
1 2019-10-17 14:11:00

有列表時如何獲取 dataframe 列的唯一值 - python

問題描述

4 個解決方案

解決方案1 3 已采納 2019-10-17 13:52:08

解決方案2 2 2019-10-17 13:49:27

解決方案3 1 2019-10-17 14:08:22

更改輸入樣本

執行literal_eval。 欲了解更多信息，請查看literal_eval

獨特的 Colors

解決方案4 1 2019-10-17 14:11:00

解決方案1
3 已采納 2019-10-17 13:52:08

解決方案2
2 2019-10-17 13:49:27

解決方案3
1 2019-10-17 14:08:22

執行literal_eval。欲了解更多信息，請查看literal_eval

解決方案4
1 2019-10-17 14:11:00