[英]Slice a dataframe based on conditions of column names and values
我有一個數據框,其中包含語言作為列名,以及包含帳戶名的1x最終列:
EN DE IT Account
Milan Mailand Milano Italy
Florence Florenz Firenze Italy
London London Londra UK
Belgrade Belgrad Belgrado World
我需要從此數據庫中提取信息,根據列名(語言)和帳戶列中的值的組合創建所有可能的列表。
例如,輸出將是:
EN_Italy = ['Milan', 'Florence']
DE_Italy = ['Mailand', 'Florenz']
IT_Italy = ['Milano', 'Firenze']
EN_UK = ['London']
DE_UK = ['London']
IT_UK = ['Londra']
EN_World = ['Belgrade']
DE_World = ['Belgrad']
IT_World = ['Belgrado']
有可能這樣做嗎? 謝謝!
你可以aggregate()
:
df = df.groupby("Account").aggregate(lambda k: list(k)).reset_index()
Account DE EN IT
0 Italy [Mailand, Florenz] [Milan, Florence] [Milano, Firenze]
1 UK [London] [London] [Londra]
2 World [Belgrad] [b] [Belgrado]
要獲取列表,請執行簡單的選擇,例如
df[df.Account == "Italy"]["DE"]
0 [Mailand, Florenz]
對於可變數量的變量,字典通常是一個不錯的選擇。
您可以使用collections.defaultdict
:
from collections import defaultdict
d = defaultdict(list)
for row in df.itertuples():
for i in row._fields[1:-1]:
d[i+'_'+row.Account].append(getattr(row, i))
結果
defaultdict(list,
{'DE_Italy': ['Mailand', 'Florenz'],
'DE_UK': ['London'],
'DE_World': ['Belgrad'],
'EN_Italy': ['Milan', 'Florence'],
'EN_UK': ['London'],
'EN_World': ['Belgrade'],
'IT_Italy': ['Milano', 'Firenze'],
'IT_UK': ['Londra'],
'IT_World': ['Belgrado']})
說明
defaultdict
。 使用堆棧
df.set_index('Account').unstack().groupby(level=[0, 1]).apply(list)
Account
EN Italy [Milan, Florence]
UK [London]
World [Belgrade]
DE Italy [Mailand, Florenz]
UK [London]
World [Belgrad]
IT Italy [Milano, Firenze]
UK [Londra]
World [Belgrado]
dtype: object
d = df.set_index('Account').ustack().groupby(level=[0, 1]).apply(list)
d.index = d.index.map('_'.join)
d
EN_Italy [Milan, Florence]
EN_UK [London]
EN_World [Belgrade]
DE_Italy [Mailand, Florenz]
DE_UK [London]
DE_World [Belgrad]
IT_Italy [Milano, Firenze]
IT_UK [Londra]
IT_World [Belgrado]
dtype: object
要么
d.to_dict()
{'DE_Italy': ['Mailand', 'Florenz'],
'DE_UK': ['London'],
'DE_World': ['Belgrad'],
'EN_Italy': ['Milan', 'Florence'],
'EN_UK': ['London'],
'EN_World': ['Belgrade'],
'IT_Italy': ['Milano', 'Firenze'],
'IT_UK': ['Londra'],
'IT_World': ['Belgrado']}
只是另一種dict理解方法:
accts = df['Account']
langs = [col for col in df.columns if col != 'Account']
result = {'{}_{}'.format(lang, acct): df.loc[df['Account']==acct, lang].tolist() \
for lang in langs for acct in accts}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.