I have a list of dataframes
df1 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], '2020': [3, 4, 3, 8, 4, 5]}
df2 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], '2021': [30, 40, 30, 80, 40, 50]}
df3 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], '2022': [31, 41, 31, 81, 41, 51]}
listOfDf = [df1, df2, df3]
Then I add them side by side
sideBySideDataframe = pd.concat(listOfDf , axis=1)
Which gives me this
2020 col1 2021 col1 2022 col1
0 3 'a' 30 'a' 31 'a'
1 4 'a' 40 'a' 41 'a'
2 3 'b' 30 'b' 31 'b'
3 8 'c' 80 'c' 81 'c'
...
Now, I only want to keep col1
once. So I tried to delete col1
by index
sideBySideDataframe = sideBySideDataframe.drop(sideBySideDataframe.columns[[1, 3]],axis = 1)
However, this deleted me all col1
columns. I tried
sideBySideDataframe = sideBySideDataframe.drop(sideBySideDataframe.columns[[1]],axis = 1)
with the same effect. However, when I use
sideBySideDataframe = sideBySideDataframe.set_index('col1')
I get
2020 2021 2022
col1
('a', 'a', 'a') ...
('a', 'a', 'a')
('b', 'b', 'b')
('c', 'c', 'c')
My output should be
2020 2021 2022
col1
'a' ...
'a'
'b'
'c'
Sooo, I am not sure why pandas
deletes all col1
, even if I only reference col1
by index
. Is there a way to only keep one duplicated column name when doing pd.concat(listOfDf, axis=1)
or how would I set the index, so that it does not add all the values from each column where the name fits col1
.
I'd say you set col1
as index, then concat:
sideBySideDataframe = pd.concat([d.set_index('col1') for d in listOfDf], axis=1)
Output:
2020 2021 2022
col1
a 3 30 31
a 4 40 41
b 3 30 31
c 8 80 81
c 4 40 41
d 5 50 51
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.