简体   繁体   中英

Efficiently adding calculated rows based on index values to a pandas DataFrame

I have a pandas DataFrame in the following format:

     a   b   c
0    0   1   2
1    3   4   5
2    6   7   8
3    9  10  11
4   12  13  14
5   15  16  17

I want to append a calculated row that performs some math based on a given items index value, eg adding a row that sums the values of all items with an index value < 2, with the new row having an index label of 'Red'. Ultimately, I am trying to add three rows that group the index values into categories:

  • A row with the sum of item values where index value are < 2, labeled as 'Red'
  • A row with the sum of item values where index values are 1 < x < 4, labeled as 'Blue'
  • A row with the sum of item values where index values are > 3, labeled as 'Green'

Ideal output would look like this:

       a   b   c
0      0   1   2
1      3   4   5
2      6   7   8
3      9  10  11
4     12  13  14
5     15  16  17
Red    3   5   7
Blue  15  17  19
Green 27  29  31

My current solution involves transposing the DataFrame, applying a map function for each calculated column and then re-transposing, but I would imagine pandas has a more efficient way of doing this, likely using .append() .

EDIT: My in-elegant pre-set list solution (originally used .transpose() but I improved it using .groupby() and .append() ):

df = pd.DataFrame(np.arange(18).reshape((6,3)),columns=['a', 'b', 'c'])
df['x'] = ['Red', 'Red', 'Blue', 'Blue', 'Green', 'Green']
df2 = df.groupby('x').sum()
df = df.append(df2)
del df['x']

I much prefer the flexibility of BrenBarn's answer (see below).

Here is one way:

def group(ix):
    if ix < 2:
        return "Red"
    elif 2 <= ix < 4:
        return "Blue"
    else:
        return "Green"

>>> print d
    a   b   c
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5  15  16  17
>>> print d.append(d.groupby(d.index.to_series().map(group)).sum())
        a   b   c
0       0   1   2
1       3   4   5
2       6   7   8
3       9  10  11
4      12  13  14
5      15  16  17
Blue   15  17  19
Green  27  29  31
Red     3   5   7

For the general case, you need to define a function (or dict) to handle the mapping to different groups. Then you can just use groupby and its usual abilities.

For your particular case, it can be done more simply by directly slicing on the index value as Dan Allan showed, but that will break down if you have a more complex case where the groups you want are not simply definable in terms of contiguous blocks of rows. The method above will also easily extend to situations where the groups you want to create are not based on the index but on some other column (ie, group together all rows whose value in column X is within range 0-10, or whatever).

The role of "transpose," which you say you used in your unshown solution, might be played more naturally by the orient keyword argument, which is available when you construct a DataFrame from a dictionary.

In [23]: df
Out[23]: 
    a   b   c
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5  15  16  17

In [24]: dict = {'Red': df.loc[:1].sum(), 
                 'Blue': df.loc[2:3].sum(), 
                 'Green': df.loc[4:].sum()}

In [25]: DataFrame.from_dict(dict, orient='index')
Out[25]: 
        a   b   c
Blue   15  17  19
Green  27  29  31
Red     3   5   7

In [26]: df.append(_)
Out[26]: 
        a   b   c
0       0   1   2
1       3   4   5
2       6   7   8
3       9  10  11
4      12  13  14
5      15  16  17
Blue   15  17  19
Green  27  29  31
Red     3   5   7

Based the numbers in your example, I assume that by "> 4" you actually meant ">= 4".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM