[英]Melt multiple columns pandas dataframe based on criteria
I have a pandas dataframe as follows:我有一个 pandas dataframe 如下:
dataframe = pd.DataFrame(
{
'ID': [1,2,3,4],
'Gender': ['F','F','M','M'],
'Language': ['EN', 'ES', 'EN', 'EN'],
'Year 1': [2020,2020,2020,2020],
'Score 1': [93,97,83,86],
'Year 2': [2020,2020,None,2018],
'Score 2': [85,95,None,55],
'Year 3': [2020,2018,None,None],
'Score 3': [87,86,None,None]
}
)
ID ![]() |
Gender![]() |
Language![]() |
Year 1![]() |
Score 1![]() |
Year 2![]() |
Score 2![]() |
Year 3![]() |
Score 3![]() |
---|---|---|---|---|---|---|---|---|
1 ![]() |
F ![]() |
EN ![]() |
2020 ![]() |
93 ![]() |
2020 ![]() |
85 ![]() |
2020 ![]() |
87 ![]() |
2 ![]() |
F ![]() |
ES ![]() |
2020 ![]() |
97 ![]() |
2020 ![]() |
95 ![]() |
2018 ![]() |
86 ![]() |
3 ![]() |
M![]() |
EN ![]() |
2020 ![]() |
83 ![]() |
||||
4 ![]() |
M![]() |
EN ![]() |
2020 ![]() |
86 ![]() |
2018 ![]() |
55 ![]() |
And I would like to melt based on the year and the corresponding scores, for example if any year equals 2020 then the following would be generated:我想根据年份和相应的分数融化,例如,如果任何年份等于 2020,那么将生成以下内容:
ID ![]() |
Gender![]() |
Language![]() |
Year![]() |
Score![]() |
---|---|---|---|---|
1 ![]() |
F ![]() |
EN ![]() |
2020 ![]() |
93 ![]() |
1 ![]() |
F ![]() |
EN ![]() |
2020 ![]() |
85 ![]() |
1 ![]() |
F ![]() |
EN ![]() |
2020 ![]() |
87 ![]() |
2 ![]() |
F ![]() |
ES ![]() |
2020 ![]() |
97 ![]() |
2 ![]() |
F ![]() |
ES ![]() |
2020 ![]() |
95 ![]() |
3 ![]() |
M![]() |
EN ![]() |
2020 ![]() |
83 ![]() |
4 ![]() |
M![]() |
EN ![]() |
2020 ![]() |
86 ![]() |
I have tried using pd.melt
but am having trouble filtering by the year across the columns and keeping the corresponding entries.我曾尝试使用
pd.melt
,但无法按年份跨列过滤并保留相应的条目。
From what i understand, you may try:据我了解,您可以尝试:
out = (pd.wide_to_long(dataframe,['Year','Score'],['ID','Gender','Language'],'v',' ')
.dropna().droplevel(-1).reset_index())
print(out)
ID Gender Language Year Score
0 1 F EN 2020.0 93.0
1 1 F EN 2020.0 85.0
2 1 F EN 2020.0 87.0
3 2 F ES 2020.0 97.0
4 2 F ES 2020.0 95.0
5 2 F ES 2018.0 86.0
6 3 M EN 2020.0 83.0
7 4 M EN 2020.0 86.0
8 4 M EN 2018.0 55.0
long_df = (pd.wide_to_long(dataframe, stubnames=["Year","Score"],i="ID", j="Repetition",
sep = " ").reset_index())
df_2020 = (long_df[long_df["Year"] == 2020].drop("Repetition",
axis=1).sort_values("ID").reset_index(drop=True))
print(df_2020)
ID Gender Language Year Score
0 1 F EN 2020.0 93.0
1 1 F EN 2020.0 85.0
2 1 F EN 2020.0 87.0
3 2 F ES 2020.0 97.0
4 2 F ES 2020.0 95.0
5 3 M EN 2020.0 83.0
6 4 M EN 2020.0 86.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.