简体   繁体   中英

ValueError when filling pandas dataframe cell with Numpy array

The goal is to make a Pandas Series where every element is a variable-length numpy array. These arrays come from a function getContexts , which gets the masked results of one dataframe cnv and applies it to another dataframe exp . This is done twice: once for the True (loss), and once for the False (no_loss) condition. The error I get is ValueError: setting an array element with a sequence occurring at the second line in getContexts .

Here is are some test data to try out:

deldf = pd.DataFrame([[0,1,0,1],
                     [1,0,1,0],
                     [1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']

d_mask = deldf == 1

expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
                     [10,0,12,1,np.array([2,2,2])],
                     [1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']

results = pd.DataFrame(dels.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']

Here is my attempt at the solution (note that d_mask is a global variable):

def getContexts(exp_g, cnv_gm):
    lossTrue = d_mask.loc[cnv_g]
    # error is thrown at line below
    loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
    no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
    return loss, no_loss

Here is my call to getContexts :

results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])

The end result should look like the dataframe below, so that I can check variance, length, mean, and effect-size on the two columns of arrays. 在此处输入图片说明

You code seems to have some reference errors. After I changed dels to deldf and cnv_g to cnv_gm, it's no longer throwing errors.

deldf = pd.DataFrame([[0,1,0,1],
                     [1,0,1,0],
                     [1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']

d_mask = deldf == 1

expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
                     [10,0,12,1,np.array([2,2,2])],
                     [1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']

results = pd.DataFrame(deldf.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']

def getContexts(exp_g, cnv_gm):
    lossTrue = d_mask.loc[cnv_gm]
    # error is thrown at line below
    loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
    no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
    return loss, no_loss

results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])
print(results)

  cnv exp       loss no_loss
0   k   x     [2, 4]  [0, 1]
1   l   y   [10, 12]  [0, 1]
2   m   Z  [1, 1, 1]     [1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM