The goal is to make a Pandas Series where every element is a variable-length numpy array. These arrays come from a function getContexts
, which gets the masked results of one dataframe cnv
and applies it to another dataframe exp
. This is done twice: once for the True
(loss), and once for the False
(no_loss) condition. The error I get is ValueError: setting an array element with a sequence
occurring at the second line in getContexts
.
Here is are some test data to try out:
deldf = pd.DataFrame([[0,1,0,1],
[1,0,1,0],
[1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']
d_mask = deldf == 1
expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
[10,0,12,1,np.array([2,2,2])],
[1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']
results = pd.DataFrame(dels.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']
Here is my attempt at the solution (note that d_mask is a global variable):
def getContexts(exp_g, cnv_gm):
lossTrue = d_mask.loc[cnv_g]
# error is thrown at line below
loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
return loss, no_loss
Here is my call to getContexts
:
results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])
The end result should look like the dataframe below, so that I can check variance, length, mean, and effect-size on the two columns of arrays.
You code seems to have some reference errors. After I changed dels to deldf and cnv_g to cnv_gm, it's no longer throwing errors.
deldf = pd.DataFrame([[0,1,0,1],
[1,0,1,0],
[1,1,1,0]])
deldf.columns = ['a','b','c','d']
deldf['cnv'] = ['k','l','m']
deldf.set_index(deldf['cnv'], inplace=True, drop=True)
del deldf['cnv']
d_mask = deldf == 1
expdf = pd.DataFrame([[0,2,1,4,np.array([1,1,1])],
[10,0,12,1,np.array([2,2,2])],
[1,1,1,1,np.array([3,3,3])]])
expdf.columns = ['a','b','c','d','arr']
expdf['exp'] = ['x','y','Z']
expdf.set_index(expdf['exp'], inplace=True, drop=True)
del expdf['exp']
results = pd.DataFrame(deldf.index)
results['exp'] = expdf.index
results.columns = ['cnv','exp']
def getContexts(exp_g, cnv_gm):
lossTrue = d_mask.loc[cnv_gm]
# error is thrown at line below
loss = np.array(expdf.loc[exp_g].where(lossTrue, np.nan).dropna())
no_loss = np.array(expdf.loc[exp_g].where(~lossTrue, np.nan).dropna())
return loss, no_loss
results['loss'], results['no_loss'] = np.vectorize(getContexts)(results['exp'], results['cnv'])
print(results)
cnv exp loss no_loss
0 k x [2, 4] [0, 1]
1 l y [10, 12] [0, 1]
2 m Z [1, 1, 1] [1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.