[英]How to split data to multiple rows in pandas on one condition?
I have the data in the below dataframe as:- 我将以下数据框中的数据表示为: -
id name value year quarter
1 an 2.3 2012 1
2 yu 3.5 2012 2
3 ij 3.1 2013 4
4 ij 2.1 2013 1
to be converted to below dataframe ie get month from quarter and split row into 3. 要转换为低于数据帧,即从季度获得月份,将行划分为3。
id name value year quarter month
1 an 2.3 2012 1 01
1 an 2.3 2012 1 02
1 an 2.3 2012 1 03
2 yu 3.5 2012 2 04
2 yu 3.5 2012 2 05
2 yu 3.5 2012 2 06
3 ij 3.1 2013 4 10
3 ij 3.1 2013 4 11
3 ij 3.1 2013 4 12
4 ij 2.1 2013 1 01
4 ij 2.1 2013 1 02
4 ij 2.1 2013 1 03
Create a quarter to month dataframe to merge on 创建要合并的季度到月份数据框
q2m = pd.DataFrame([
[(m - 1) // 3 + 1, m] for m in range(1, 13)],
columns=['quarter', 'month']
)
df.merge(q2m)
id name value year quarter month
0 1 an 2.3 2012 1 1
1 1 an 2.3 2012 1 2
2 1 an 2.3 2012 1 3
3 2 yu 3.5 2012 2 4
4 2 yu 3.5 2012 2 5
5 2 yu 3.5 2012 2 6
6 3 ij 3.1 2013 4 10
7 3 ij 3.1 2013 4 11
8 3 ij 3.1 2013 4 12
First, create a DataFrame with the month ranges of each quarter in your current DataFrame: 首先,创建一个DataFrame,其中包含当前DataFrame中每个季度的月份范围:
m = pd.DataFrame([range(i*3-2, 3*i+1) for i in df.quater], index=df.quater)
0 1 2
quater
1 1 2 3
2 4 5 6
4 10 11 12
Now join and stack: 现在加入并堆叠:
df.set_index('quater').join(m.stack().reset_index(1, drop=True).rename('month'))
id name value year month
quater
1 1 an 2.3 2012 1
1 1 an 2.3 2012 2
1 1 an 2.3 2012 3
2 2 yu 3.5 2012 4
2 2 yu 3.5 2012 5
2 2 yu 3.5 2012 6
4 3 ij 3.1 2013 10
4 3 ij 3.1 2013 11
4 3 ij 3.1 2013 12
You could use repeat 你可以使用重复
In [360]: dff = df.loc[df.index.repeat(3)]
In [362]: dff.assign(month = dff.quater.sub(1) * 3 + dff.groupby('quater').cumcount() + 1)
Out[362]:
id name value year quater month
0 1 an 2.3 2012 1 1
0 1 an 2.3 2012 1 2
0 1 an 2.3 2012 1 3
1 2 yu 3.5 2012 2 4
1 2 yu 3.5 2012 2 5
1 2 yu 3.5 2012 2 6
2 3 ij 3.1 2013 4 10
2 3 ij 3.1 2013 4 11
2 3 ij 3.1 2013 4 12
Using reindex
with pd.to_datetime
, and we adding the cumcount
for each sub group 将
reindex
与pd.to_datetime
一起pd.to_datetime
,我们为每个子组添加cumcount
df=df.reindex(df.index.repeat(3))
df['Month']=pd.to_datetime(df[['year','quarter']].astype(str).apply('Q'.join,1)).dt.month+df.groupby(level=0).cumcount()
df
Out[1258]:
id name value year quarter Month
0 1 an 2.3 2012 1 1
0 1 an 2.3 2012 1 2
0 1 an 2.3 2012 1 3
1 2 yu 3.5 2012 2 4
1 2 yu 3.5 2012 2 5
1 2 yu 3.5 2012 2 6
2 3 ij 3.1 2013 4 10
2 3 ij 3.1 2013 4 11
2 3 ij 3.1 2013 4 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.