如何获取唯一 pandas dataframe 列元素的列表？

Question

I am trying to get a list for each unique string in a pandas dataframe column:我正在尝试获取 pandas dataframe 列中每个唯一字符串的列表：

import pandas as pd

catalog = {'code': ['A001', 'A001', 'A001', 'A002', 'A002'], 'title': ['director', 'president', 'vice president', 'sales director', 'sales vice president']}

catalog=pd.DataFrame(catalog)

## unique column values ##
codes = catalog['code'].unique()

for code in codes:
     titles = catalog[catalog == code]['title'].tolist()
     print(titles)

Which gives the next output:这给出了下一个 output：

[nan, nan, nan, nan, nan]
[nan, nan, nan, nan, nan]

Expected output could look like this:预期的 output 可能如下所示：

['director', 'president', 'vice president']
['sales director', 'sales vice president']

What am I missing?我错过了什么？ Is there any other way to accomplish this task?有没有其他方法可以完成这项任务？

Answer 1

Try with尝试

catalog.groupby('code')['title'].unique()
code
A001     [director, president, vice president]
A002    [sales director, sales vice president]
Name: title, dtype: object

Answer 2

Instead of iterating through the unique codes, it's easier to use a groupby:与其遍历唯一代码，不如使用 groupby 更容易：

catalog.groupby("code").title.apply(list)

code
A001    [director, president, vice president]
A002    [sales director, sales vice president]
Name: title, dtype: object

Answer 3

Your code has an issue where you compare the full dataframe when assigning the title variable, instead of comparing with a column:您的代码存在问题，您在分配title变量时比较完整的 dataframe，而不是与列进行比较：

for code in codes:
    titles = catalog[catalog['code'] == code]['title'].tolist()
    print(titles)

Or:或者：

for code in codes:
    titles = catalog.loc[catalog['code'] == code,'title'].tolist()
    print(titles)

['director', 'president', 'vice president']
['sales director', 'sales vice president']

如何获取唯一 pandas dataframe 列元素的列表？

问题描述

3 个解决方案

解决方案1
4 2021-04-22 14:57:38

解决方案2
3 2021-04-22 14:56:42

解决方案3
3 已采纳 2021-04-22 14:58:21

如何获取唯一 pandas dataframe 列元素的列表？

问题描述

3 个解决方案

解决方案1 4 2021-04-22 14:57:38

解决方案2 3 2021-04-22 14:56:42

解决方案3 3 已采纳 2021-04-22 14:58:21

解决方案1
4 2021-04-22 14:57:38

解决方案2
3 2021-04-22 14:56:42

解决方案3
3 已采纳 2021-04-22 14:58:21