簡體   English   中英

對groupby pandas數據框的算術運算

[英]Arithmetic operation on a groupby pandas dataframe

我有一個40列和400000行的熊貓數據框。 我在3列上創建了一個匯總數據集。

現在,我需要根據其中兩列來計算百分比指標。 Python引發錯誤-

unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

這是示例代碼:

print sample_data
   date  part  receipt  bad_dollars  total_dollars  bad_percent
0     1   123       22           40            100          NaN
1     2   456       44           80            120          NaN
2     3   134       33           30            150          NaN
3     1   123       22           80            100          NaN
4     5   456       45           40             90          NaN
5     3   134       33           85            150          NaN
6     7   123       24           70            120          NaN
7     5   456       45           20             85          NaN
8     9   134       35           50            300          NaN
9     7   123       24          300            600          NaN

sample_data_group = sample_data.groupby(['date','part','receipt'])

sample_data_group['bad_percents']=sample_data_group['bad_dollars']/sample_data_group['total_dollars']

TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

請幫忙!

您可以對groupby對象應用apply來執行此操作:

import pandas as pd
import numpy as np

cols = ['index', 'date',  'part',  'receipt',  'bad_dollars',  'total_dollars',
        'bad_percent']
sample_data = pd.DataFrame([
[0,     1,   123,       22,           40,            100,          np.nan],
[1,     2,   456,       44,           80,            120,          np.nan],
[2,     3,   134,       33,           30,            150,          np.nan],
[3,     1,   123,       22,           80,            100,          np.nan],
[4,     5,   456,       45,           40,             90,          np.nan],
[5,     3,   134,       33,           85,            150,          np.nan],
[6,     7,   123,       24,           70,            120,          np.nan],
[7,     5,   456,       45,           20,             85,          np.nan],
[8,     9,   134,       35,           50,            300,          np.nan],
[9,     7,   123,       24,          300,            600,          np.nan]],
                           columns = cols).set_index('index', drop = True)

sample_data_group = sample_data.groupby(['date','part','receipt'])

xx = sample_data_group.apply(
         lambda x: x.assign(bad_percent = x.bad_dollars/x.total_dollars))\
                      .reset_index(['date','part', 'receipt'], drop = True)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM