简体   繁体   中英

How do I add one row of a PANDAS dataframe to the rest of the rows?

If a pandas dataframe df contains:

    A    B    C    D
a   1    2    3    4
b   2    NaN  NaN  5
c   NaN  7    NaN  2
d   NaN  2    4    3

How do I add the first row to all the rest of the rows, only where they contain a number, to get the resulting dataframe:

    A    B    C    D
b   3    NaN  NaN  9
c   NaN  9    NaN  6
d   NaN  4    7    7

I plan to do this, then make a dictionary of the row names and divide the product of the columns each row of the first table by the same row in the second table, saving the values in the dictionary. I have working code that does this (below), but I worry that it's not "PANDAS" enough and that it's overly complicated for the simple task I want to perform. Do I have the optimal solution, or am I missing something obvious?

If the Pandas code still has to iterate over the rows, then it's not worth it, but I feel like there should be a way to do this in-place.

Code:

import numpy as np
import pandas as pd

dindex = [1,2,3] #indices of drugs to select (set this)

def get_drugs(): #generates random "drug characteristics" as pandas df
    cduct = ['dose','g1','g2','g3','g4','g5']
    drg = ['d1','d2','d3','d4']
    return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)

def sel_drugs(dframe, selct): #removes unwanted drugs from df.
    #Pass df and dindex to this function
    return dframe.iloc[:,selct].values, dframe[1:].index.tolist()
    #returns a tuple of [values, names]

def cal_conduct(val, cnames): #calculates conductance scaling.
    #Pass values and names to this function
    cduct = {} #initialize dict
    for ix, gname in enumerate(cnames):
        _top = val[ix+1]; _bot = val[0]+val[ix+1]
        cduct[gname] = (np.product(_top[np.isfinite(_top)])/
                        np.product(_bot[np.isfinite(_bot)]))
    return cduct #return a dictionary of scaling factors

def main():
    selection =  sel_drugs(get_drugs(),dindex)
    print cal_conduct(selection[0], selection[1])

main()

Pandas automatically aligns/broadcasts so this is straightforward

In [8]: df
Out[8]: 
    A   B   C  D
a   1   2   3  4
b   2 NaN NaN  5
c NaN   7 NaN  2
d NaN   2   4  3

In [11]: df.iloc[1:] + df.iloc[0]
Out[11]: 
    A   B   C  D
b   3 NaN NaN  9
c NaN   9 NaN  6
d NaN   4   7  7

The second part is this if I am reading it correctly

In [12]: df2 = df.iloc[1:] + df.iloc[0]

In [13]: df.prod()
Out[13]: 
A      2
B     28
C     12
D    120
dtype: float64

In [14]: df2/df.prod()
Out[14]: 
     A         B         C         D
b  1.5       NaN       NaN  0.075000
c  NaN  0.321429       NaN  0.050000
d  NaN  0.142857  0.583333  0.058333

Here is some code based on what @Jeff answered. It is about 40% slower, at least with the small test data, but it is simpler.

import numpy as np
import pandas as pd

dindex = [1,2,3] #indices of drugs to select (set this)

def get_drugs(): #generates random "drug characteristics" as pandas df
    cduct = ['dose','g1','g2','g3','g4','g5']
    drg = ['d1','d2','d3','d4']
    return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)

def cal_conduct(frame,selct): #calculates conductance scaling.
    #Pass df with selections made
    s = frame.iloc[:,selct]
    cduct = s.iloc[1:].prod(1)/(s.iloc[0]+s.iloc[1:]).prod(1)
    return cduct.to_dict() #return a dictionary of scaling factors

def main():
    scaling = cal_conduct(get_drugs(), dindex)
    print scaling

main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM