[英]Python sorting values and grouping them based on unique keys
我有如下的元組元素列表。 我想將元素分組為多維行和列。 例如:
說列表是“列表”:
[("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")]
我想將此列表打印為:
------------------------------------------
Name | AlaskanAir | DeltaAir | DragonAir
------------------------------------------
Adam *
Bianca *
Romeo *
Danaerys *
Jon *
Walter *
------------------------------------------
我首先要找到要用作行標題的所有唯一元素。
row=[]
for i in list:
row.append(i[1])
row = list(set(row))
然后,我將遍歷“行”中的元素,然后構建表。 我如何輕松構建它? 謝謝!
我們可以用pandas
做到這一點:
import pandas as pd
df = pd.DataFrame([("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")], columns=['name', 'value'])
result = pd.get_dummies(df, columns=['value']).rename(columns={f'value_{col}': col for col in df['value'].unique()}).replace({col: {0: '', 1: '*'} for col in df['value'].unique()})
print(result)
輸出:
name AlaskanAir DeltaAir DragonAir
0 Adam *
1 Bianca *
2 Romeo *
3 Danaerys *
4 Jon *
5 Walter *
這會將相關列中與每個個體對應的值轉換為1或0。 然后,我們簡單地用*
替換1並用空字符串替換0。
請注意,對於邏輯而言 , pandas
不是必需的,可以簡單地完成,但是對於表格的對齊來說很方便。
lst = [("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir")]
#Create pandas DataFrame with the names from the list
df = pd.DataFrame([elem[0] for elem in lst], columns=["Name"])
#Iterate over a set (unique values) of character properties (DeltaAir, AlaskanAir, DragonAir)
for elem in set([elem[1] for elem in lst]):
#make a list containing spaces or *'s for every character in the list
#depending on the property we are just iterating over and add that list
# as a column to the DataFrame
df[elem] = ["*" if item[1] == elem else " " for item in lst]
編輯您的評論:
您可以使用groupby和合計按名稱組合值(如果那不是您的意思,請說明)。
df.reindex(sorted(df.columns))
df2 = pd.DataFrame(sorted(list(df["Name"].unique())), columns = ["Name"])
for elem in set([elem[1] for elem in lst]):
df2[elem] = list(df.groupby(['Name'])[elem].agg(lambda x: "*" if "*" in x.values else " "))
添加更多信息
謝謝弗洛里安。 我的意思是,如果有重復的姓名,如下所示,則應適當地填入相應的航空公司行。 例如:亞當和羅密歐出現兩次將是這樣,而不是使用兩個單獨的行來表示相同的名稱。
[("Adam", "DeltaAir"),
("Bianca", "AlaskanAir"),
("Romeo", "DeltaAir"),
("Danaerys", "DragonAir"),
("Jon", "DragonAir"),
("Walter", "AlaskanAir"),
("Adam", "AlaskanAir"),
("Romeo", "DragonAir")]
------------------------------------------
Name | AlaskanAir | DeltaAir | DragonAir
------------------------------------------
Adam * *
Bianca *
Romeo * *
Danaerys *
Jon *
Walter *
------------------------------------------
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.