[英]How to consolidate/divide rows within a data frame based on a value within a certain column using pandas?
The starting df is as follows:开始的df如下:
company metric time data
x X10384 M1 100
x X10384 M2 100
x X10384 M3 100
y X10456 M4 200
y X10456 M5 200
y X10456 M6 200
I need to be able to consolidate these rows based on the value of the time dimension.我需要能够根据时间维度的值合并这些行。 Basically "M1, M2, M3" will encompass Q1 and "M4, M5, M6" will encompass Q2 and so on.
基本上,“M1、M2、M3”将包含 Q1,“M4、M5、M6”将包含 Q2,依此类推。
The resulting df will need to be as follows:生成的 df 将需要如下所示:
company metric time data
x X10384 Q1 300
y X10456 Q2 600
Similarly, if starting with a df already in quarters, I will need to be able split the time into months and split the data into an equal three, like in the initial df.同样,如果从 df 开始,我将需要能够将时间分成几个月并将数据分成相等的三个,就像在初始 df 中一样。
How would one go about trying to transformation this data as above?一个 go 将如何尝试像上面那样转换这些数据? The below is my starting point for reference:
以下是我的参考起点:
quarters = ['Q1', 'Q2']
months = ['M1','M2','M3','M4','M5','M6']
for time in df['time']:
if time in quarters:
[insert transformation into individual months]
elif time in months:
[insert transformation into quarters]
Creating a dataframe based on your data:根据您的数据创建 dataframe:
data = {'Company' : ['x', 'x', 'x', 'y', 'y', 'y'],
'Metric' : ['X10384', 'X10384', 'X10384', 'X10456', 'X10456', 'X10456'],
'time': ['M1', 'M2', 'M3', 'M4', 'M5', 'M6'],
'data': [100, 100, 100, 200, 200, 200]}
df = pd.DataFrame(data)
Then create a dictionary and map it per time:然后每次创建一个字典和 map:
dict = {'M1': 'Q1', 'M2' : 'Q1', 'M3' : "Q1", 'M4' : 'Q2', 'M5' : 'Q2', 'M6' : 'Q2'}
df['time'] = df['time'].map(dict)
And groupby will give you the final result: groupby 会给你最终的结果:
df.groupby(['Company','Metric','time']).sum().reset_index()
Extract digits from time
column then convert them to quarter number.从
time
列中提取数字,然后将它们转换为季度数。 Finally, a simple groupby_sum
do the job:最后,一个简单的
groupby_sum
完成这项工作:
# Convert M1, M2, M3, M4, M5, M6 to Q1, Q1, Q1, Q2, Q2, Q2
to_quarter = df['time'].str[1:].astype(int).floordiv(4).add(1).astype(str).radd('Q')
out = df.assign(time=to_quarter).groupby(['company', 'metric', 'time']) \
.sum().reset_index()
Output: Output:
>>> out
company metric time data
0 x X10384 Q1 300
1 y X10456 Q2 600
From months to quarters从几个月到几个季度
input输入
company metric time data
x X10384 M1 100
x X10384 M2 100
x X10384 M3 100
y X10456 M4 200
y X10456 M5 200
y X10456 M6 200
create a dictionary with a key in months for a value in quarters and map up column with key创建一个字典,以月为键,以季度为值,map 使用键向上列
months_to_quarters_dict = {'M1': 'Q1', 'M2' : 'Q1', 'M3' : "Q1", 'M4' : 'Q2', 'M5' : 'Q2', 'M6' : 'Q2'}
df['time'] = df['time'].map(months_to_quarters_dict)
output (1a) output (1a)
company metric time data
x X10384 Q1 100
x X10384 Q1 100
x X10384 Q1 100
y X10456 Q2 200
y X10456 Q2 200
y X10456 Q2 200
use a groupby().agg('sum') to get condensed df使用 groupby().agg('sum') 得到浓缩的 df
df.groupby(['Company','Metric','time'], as_index=False).agg('sum')
output (1b) output (1b)
company metric time data
x X10384 Q1 300
y X10456 Q2 600
From quarters to months从季度到几个月
input输入
company metric time data
x X10384 Q1 300
y X10456 Q2 600
create a dictionary with a key in quarters for value in months and map up column with key创建一个字典,其中以季度为键,以月为单位,map 使用键向上列
quarters_to_months_dict = {'Q1' : ['M1', 'M2', 'M3'], 'Q2' : ['M4', 'M5', 'M6']}
df['time'] = df['time'].map(months_to_quarters_dict)
output (2a) output (2a)
company metric time data
0 x X10384 ['M1', 'M2', 'M3'] 300
1 y X10456 ['M4', 'M5', 'M6'] 600
split rows using explode on time column and divide up data column by 3 to yield an equal amount for each month in a quarter使用explode on time 列拆分行并将数据列除以3,以在一个季度中为每个月产生相等的数量
df = df.explode('time')
df['data'] = df['data].div(3)
output (2b) output (2b)
company metric time data
x X10384 M1 100
x X10384 M2 100
x X10384 M3 100
y X10456 M4 200
y X10456 M5 200
y X10456 M6 200
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.