[英]how to get a list for unique pandas dataframe column elements?
I am trying to get a list for each unique string in a pandas dataframe column:我正在尝试获取 pandas dataframe 列中每个唯一字符串的列表:
import pandas as pd
catalog = {'code': ['A001', 'A001', 'A001', 'A002', 'A002'], 'title': ['director', 'president', 'vice president', 'sales director', 'sales vice president']}
catalog=pd.DataFrame(catalog)
## unique column values ##
codes = catalog['code'].unique()
for code in codes:
titles = catalog[catalog == code]['title'].tolist()
print(titles)
Which gives the next output:这给出了下一个 output:
[nan, nan, nan, nan, nan]
[nan, nan, nan, nan, nan]
Expected output could look like this:预期的 output 可能如下所示:
['director', 'president', 'vice president']
['sales director', 'sales vice president']
What am I missing?我错过了什么? Is there any other way to accomplish this task?有没有其他方法可以完成这项任务?
Try with尝试
catalog.groupby('code')['title'].unique()
code
A001 [director, president, vice president]
A002 [sales director, sales vice president]
Name: title, dtype: object
Instead of iterating through the unique codes, it's easier to use a groupby:与其遍历唯一代码,不如使用 groupby 更容易:
catalog.groupby("code").title.apply(list)
code
A001 [director, president, vice president]
A002 [sales director, sales vice president]
Name: title, dtype: object
Your code has an issue where you compare the full dataframe when assigning the title
variable, instead of comparing with a column:您的代码存在问题,您在分配title
变量时比较完整的 dataframe,而不是与列进行比较:
for code in codes:
titles = catalog[catalog['code'] == code]['title'].tolist()
print(titles)
Or:或者:
for code in codes:
titles = catalog.loc[catalog['code'] == code,'title'].tolist()
print(titles)
['director', 'president', 'vice president']
['sales director', 'sales vice president']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.