简体   繁体   English

如何在一个条件下将数据拆分到pandas中的多行?

[英]How to split data to multiple rows in pandas on one condition?

I have the data in the below dataframe as:- 我将以下数据框中的数据表示为: -

id name value year quarter 
1 an     2.3  2012 1
2 yu     3.5  2012 2
3 ij     3.1  2013 4
4 ij     2.1  2013 1

to be converted to below dataframe ie get month from quarter and split row into 3. 要转换为低于数据帧,即从季度获得月份,将行划分为3。

id name value year quarter month
1 an     2.3  2012 1       01
1 an     2.3  2012 1       02
1 an     2.3  2012 1       03
2 yu     3.5  2012 2       04
2 yu     3.5  2012 2       05
2 yu     3.5  2012 2       06
3 ij     3.1  2013 4       10  
3 ij     3.1  2013 4       11
3 ij     3.1  2013 4       12
4 ij     2.1  2013 1       01
4 ij     2.1  2013 1       02
4 ij     2.1  2013 1       03

Create a quarter to month dataframe to merge on 创建要合并的季度到月份数据框

q2m = pd.DataFrame([
    [(m - 1) // 3 + 1, m] for m in range(1, 13)],
    columns=['quarter', 'month']
)

df.merge(q2m)

   id name  value  year  quarter  month
0   1   an    2.3  2012        1      1
1   1   an    2.3  2012        1      2
2   1   an    2.3  2012        1      3
3   2   yu    3.5  2012        2      4
4   2   yu    3.5  2012        2      5
5   2   yu    3.5  2012        2      6
6   3   ij    3.1  2013        4     10
7   3   ij    3.1  2013        4     11
8   3   ij    3.1  2013        4     12

First, create a DataFrame with the month ranges of each quarter in your current DataFrame: 首先,创建一个DataFrame,其中包含当前DataFrame中每个季度的月份范围:

m = pd.DataFrame([range(i*3-2, 3*i+1) for i in df.quater], index=df.quater)

         0   1   2
quater
1        1   2   3
2        4   5   6
4       10  11  12

Now join and stack: 现在加入并堆叠:

df.set_index('quater').join(m.stack().reset_index(1, drop=True).rename('month'))

        id name  value  year  month
quater
1        1   an    2.3  2012      1
1        1   an    2.3  2012      2
1        1   an    2.3  2012      3
2        2   yu    3.5  2012      4
2        2   yu    3.5  2012      5
2        2   yu    3.5  2012      6
4        3   ij    3.1  2013     10
4        3   ij    3.1  2013     11
4        3   ij    3.1  2013     12

You could use repeat 你可以使用重复

In [360]: dff = df.loc[df.index.repeat(3)]

In [362]: dff.assign(month = dff.quater.sub(1) * 3 + dff.groupby('quater').cumcount() + 1)
Out[362]:
   id name  value  year  quater  month
0   1   an    2.3  2012       1      1
0   1   an    2.3  2012       1      2
0   1   an    2.3  2012       1      3
1   2   yu    3.5  2012       2      4
1   2   yu    3.5  2012       2      5
1   2   yu    3.5  2012       2      6
2   3   ij    3.1  2013       4     10
2   3   ij    3.1  2013       4     11
2   3   ij    3.1  2013       4     12

Using reindex with pd.to_datetime , and we adding the cumcount for each sub group reindexpd.to_datetime一起pd.to_datetime ,我们为每个子组添加cumcount

df=df.reindex(df.index.repeat(3))
df['Month']=pd.to_datetime(df[['year','quarter']].astype(str).apply('Q'.join,1)).dt.month+df.groupby(level=0).cumcount()
df
Out[1258]: 
   id name  value  year  quarter  Month
0   1   an    2.3  2012        1      1
0   1   an    2.3  2012        1      2
0   1   an    2.3  2012        1      3
1   2   yu    3.5  2012        2      4
1   2   yu    3.5  2012        2      5
1   2   yu    3.5  2012        2      6
2   3   ij    3.1  2013        4     10
2   3   ij    3.1  2013        4     11
2   3   ij    3.1  2013        4     12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM