[英]pandas concat list of dataframes to one dataframe
Trying to combine a list of dataframes to one dataframe. 尝试将数据帧列表组合到一个数据帧。 Data looks like:
数据如下:
Date station_id Hour Temp
0 2004-01-01 1 1 46.0
1 2004-01-01 1 2 46.0
2 2004-01-01 1 3 45.0
3 2004-01-01 1 4 41.0
...
433730 2008-06-30 11 3 64.0
433731 2008-06-30 11 4 64.0
433732 2008-06-30 11 5 64.0
433733 2008-06-30 11 6 64.0
This gives me a list of dataframes: 这给了我一个数据帧列表:
stations = [x for _,x in df.groupby('station_id')]
When I reset the indices for "stations", and concat, I can get a dataframe, but it doesn't look like I'd like: 当我重置“ stations”和concat的索引时,我可以得到一个数据框,但它看起来不像我想要的:
for i in range(0,11):
stations[i].reset_index(drop=True,inplace=True)
pd.concat(stations,axis=1)
Date station_id Hour Temp Date station_id Hour Temp
0 2004-01-01 1 1 46.0 2004-01-01 2 1 38.0
1 2004-01-01 1 2 46.0 2004-01-01 2 2 36.0
2 2004-01-01 1 3 45.0 2004-01-01 2 3 35.0
3 2004-01-01 1 4 41.0 2004-01-01 2 4 30.0
I'm much rather get towards a df like this: 我更喜欢这样的df:
Date Hour Stn1 Stn2
0 2004-01-01 1 46.0 38.0
1 2004-01-01 2 46.0 6.0
2 2004-01-01 3 45.0 35.0
3 2004-01-01 4 41.0 30.0
How do I do this? 我该怎么做呢?
Based on your expected output, you are looking for a pivot table with index=['Date', 'Hour'], columns='station_id', values=Temp
. 根据您的预期输出,您正在寻找
index=['Date', 'Hour'], columns='station_id', values=Temp
透视表 。 Demo: 演示:
# A bunch of example data
df
Date station_id Hour Temp
0 2004-01-01 1 1 10.0
1 2004-01-01 1 2 20.0
2 2004-01-01 1 3 30.0
3 2004-01-01 1 4 40.0
4 2004-01-01 2 1 50.0
5 2004-01-01 2 2 60.0
6 2004-01-01 2 3 70.0
7 2004-01-01 2 4 80.0
8 2004-01-01 3 1 90.0
9 2004-01-01 3 2 100.0
10 2004-01-02 3 1 110.0
11 2004-01-02 3 2 120.0
12 2004-01-01 4 4 130.0
13 2004-01-02 4 5 140.0
# Create pivot table, with ['Date', 'Hour'] in a MultiIndex
res = df.pivot_table(columns='station_id', index=['Date', 'Hour'], values='Temp')
# Add 'Stn' prefix to each column name
res = res.add_prefix('Stn')
# Delete the name of the columns' index, which is 'station_id'
del res.columns.name
# Reset MultiIndex into columns
res.reset_index(inplace=True)
res
Date Hour Stn1 Stn2 Stn3 Stn4
0 2004-01-01 1 10.0 50.0 90.0 NaN
1 2004-01-01 2 20.0 60.0 100.0 NaN
2 2004-01-01 3 30.0 70.0 NaN NaN
3 2004-01-01 4 40.0 80.0 NaN 130.0
4 2004-01-02 1 NaN NaN 110.0 NaN
5 2004-01-02 2 NaN NaN 120.0 NaN
6 2004-01-02 5 NaN NaN NaN 140.0
For what it's worth, this gets where I want to go. 对于它的价值,这就是我想去的地方。
stations = [x for _,x in df.groupby('station_id')] #,as_index=True)]
for i in range(0,11):
stations[i].reset_index(drop=True,inplace=True)
stations[i].rename(columns={'Temp':'Stn'+str(i+1)},inplace=True)
stations[i].drop(columns='station_id',inplace=True)
if i>0:
stations[i].drop(columns=['Date','Hour'],inplace=True)
stations = pd.concat(stations,axis=1)
Feels a bit brute force to me, though. 不过,对我来说有点蛮力。 Additional pythonic suggestions welcome.
欢迎其他pythonic建议。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.