[英]How to create multiple lists from data frame column value
df1
Ticker Category
0 XOM Group 1
1 CVX Group 1
2 RDSA-GB Group 2
3 BP-GB Group 1, Group 2
4 EQNR-NO Group 3
5 FP-FR Group 4
6 ENI-IT Group 3, Group 4
7 COP Group 5
我想要的結果將根據“類別”列創建“代碼”列表,並在用“_”替換空格時列出“類別”值的名稱
其次,如果存在 Category 有兩個值的實例,例如“US Major, Euro Major”,那么我如何確保“Ticker”最終出現在兩個 Category 列表中?
Group_1 = ['XOM','CVX','BP-GB']
Group_2 = ['RDSA-GB','BP-GB']
Group_3 = ['EQNR-NO','ENI-IT']
Group_4 = ['FP-FR','ENI-IT']
Group_5 = ['COP']
謝謝!
好吧,你說列出名單,我想你的意思是用字典的方式? 如果是這種情況,試試這個:
import pandas as pd
df = pd.DataFrame([["XOM","US Major"],
["CVX","US Major"],
["RDSA-GB","Euro Major"],
["BP-GB","Euro Major"],
["EQNR-NO","Euro Major"]],columns=["Ticker","Category"])
df_to_lists = df.groupby("Category")["Ticker"].apply(list)
lists_to_dict = dict(df_to_lists)
print(lists_to_dict)
output:
{'Euro Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO'], 'US Major': ['XOM', 'CVX']}
如果您不想要字典,則 df_to_lists 輸出:
Category
Euro Major [RDSA-GB, BP-GB, EQNR-NO]
US Major [XOM, CVX]
Name: Ticker, dtype: object
你也可以像這樣使用循環的力量(我假設我的df
是你的df1
):
lists_with_unique_vals = dict()
for cat in df.Category.unique():
lists_with_unique_vals[cat.replace(' ', '_')] = list(df[df['Category']==cat]['Ticker'].unique())
結果如下:
>> print(lists_with_unique_vals)
{'US_Major': ['XOM', 'CVX'], 'Euro_Major': ['RDSA-GB', 'BP-GB', 'EQNR-NO']}
跟進@nassiam 的代碼以處理可能有多個類別的情況,
import pandas as pd
df = pd.DataFrame([["XOM","US Major"],
["CVX","US Major"],
["RDSA-GB","Euro Major"],
["BP-GB","Euro Major"],
["EQNR-NO","Euro Major"],
["ABC-XYZ", "Euro Major, US Major"],
["DEF-GHI", "Euro Major, US Major"]], columns=["Ticker","Category"])
df_to_lists = df.groupby("Category")["Ticker"].apply(list)
lists_to_dict = dict(df_to_lists)
print(lists_to_dict)
# Till here it is the same code as @nassiam pointed out
# To handle multiple-valued category
keys = lists_to_dict.keys()
for key in keys:
categories = [x.strip() for x in key.split(',')]
if len(categories) > 1:
for cat in categories:
if cat in lists_to_dict:
lists_to_dict[cat] += lists_to_dict[key]
else:
lists_to_dict[cat] = lists_to_dict[key]
lists_to_dict.pop(key, None)
# To replace space with underscore
for key in lists_to_dict:
lists_to_dict[key.replace(" ", "_")] = lists_to_dict.pop(key)
假設第一列Ticker
具有唯一值。 否則,在附加列表時使用set
使它們唯一。 我希望這有幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.