简体   繁体   English

如何使用 pandas 根据某个列中的值合并/划分数据框中的行?

[英]How to consolidate/divide rows within a data frame based on a value within a certain column using pandas?

The starting df is as follows:开始的df如下:

company  metric  time   data
x        X10384  M1     100
x        X10384  M2     100
x        X10384  M3     100
y        X10456  M4     200
y        X10456  M5     200
y        X10456  M6     200

I need to be able to consolidate these rows based on the value of the time dimension.我需要能够根据时间维度的值合并这些行。 Basically "M1, M2, M3" will encompass Q1 and "M4, M5, M6" will encompass Q2 and so on.基本上,“M1、M2、M3”将包含 Q1,“M4、M5、M6”将包含 Q2,依此类推。

The resulting df will need to be as follows:生成的 df 将需要如下所示:

company  metric  time   data
x        X10384  Q1     300
y        X10456  Q2     600

Similarly, if starting with a df already in quarters, I will need to be able split the time into months and split the data into an equal three, like in the initial df.同样,如果从 df 开始,我将需要能够将时间分成几个月并将数据分成相等的三个,就像在初始 df 中一样。

How would one go about trying to transformation this data as above?一个 go 将如何尝试像上面那样转换这些数据? The below is my starting point for reference:以下是我的参考起点:

quarters = ['Q1', 'Q2']
months = ['M1','M2','M3','M4','M5','M6']

for time in df['time']:
    if time in quarters:
        [insert transformation into individual months]
    elif time in months:
        [insert transformation into quarters]

Creating a dataframe based on your data:根据您的数据创建 dataframe:

data = {'Company' : ['x', 'x', 'x', 'y', 'y', 'y'],  
        'Metric' : ['X10384', 'X10384', 'X10384', 'X10456', 'X10456', 'X10456'],
        'time': ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'],
        'data': [100, 100, 100, 200, 200, 200]}
df = pd.DataFrame(data)

Then create a dictionary and map it per time:然后每次创建一个字典和 map:

dict = {'M1': 'Q1', 'M2' : 'Q1', 'M3' : "Q1", 'M4' : 'Q2', 'M5' : 'Q2', 'M6' : 'Q2'}
df['time'] = df['time'].map(dict)

And groupby will give you the final result: groupby 会给你最终的结果:

df.groupby(['Company','Metric','time']).sum().reset_index()

Extract digits from time column then convert them to quarter number.time列中提取数字,然后将它们转换为季度数。 Finally, a simple groupby_sum do the job:最后,一个简单的groupby_sum完成这项工作:

# Convert M1, M2, M3, M4, M5, M6 to Q1, Q1, Q1, Q2, Q2, Q2
to_quarter = df['time'].str[1:].astype(int).floordiv(4).add(1).astype(str).radd('Q')

out = df.assign(time=to_quarter).groupby(['company', 'metric', 'time']) \
                                .sum().reset_index()

Output: Output:

>>> out
  company  metric time  data
0       x  X10384   Q1   300
1       y  X10456   Q2   600

From months to quarters从几个月到几个季度

input输入

company  metric  time   data
x        X10384  M1     100
x        X10384  M2     100
x        X10384  M3     100
y        X10456  M4     200
y        X10456  M5     200
y        X10456  M6     200

create a dictionary with a key in months for a value in quarters and map up column with key创建一个字典,以月为键,以季度为值,map 使用键向上列

months_to_quarters_dict = {'M1': 'Q1', 'M2' : 'Q1', 'M3' : "Q1", 'M4' : 'Q2', 'M5' : 'Q2', 'M6' : 'Q2'}
df['time'] = df['time'].map(months_to_quarters_dict)

output (1a) output (1a)

company  metric  time   data
x        X10384  Q1     100
x        X10384  Q1     100
x        X10384  Q1     100
y        X10456  Q2     200
y        X10456  Q2     200
y        X10456  Q2     200

use a groupby().agg('sum') to get condensed df使用 groupby().agg('sum') 得到浓缩的 df

df.groupby(['Company','Metric','time'], as_index=False).agg('sum')

output (1b) output (1b)

company  metric  time   data
x        X10384  Q1     300
y        X10456  Q2     600

From quarters to months从季度到几个月

input输入

company  metric  time   data
x        X10384  Q1     300
y        X10456  Q2     600

create a dictionary with a key in quarters for value in months and map up column with key创建一个字典,其中以季度为键,以月为单位,map 使用键向上列

quarters_to_months_dict = {'Q1' : ['M1', 'M2', 'M3'], 'Q2' : ['M4', 'M5', 'M6']}
df['time'] = df['time'].map(months_to_quarters_dict)

output (2a) output (2a)

  company  metric time  data
0       x  X10384   ['M1', 'M2', 'M3']   300
1       y  X10456   ['M4', 'M5', 'M6']   600

split rows using explode on time column and divide up data column by 3 to yield an equal amount for each month in a quarter使用explode on time 列拆分行并将数据列除以3,以在一个季度中为每个月产生相等的数量

df = df.explode('time')
df['data'] = df['data].div(3)

output (2b) output (2b)

company  metric  time   data
x        X10384  M1     100
x        X10384  M2     100
x        X10384  M3     100
y        X10456  M4     200
y        X10456  M5     200
y        X10456  M6     200

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM