Machine learning random forest classifier

Question

data=pd.DataFrame({'gender':['m','f','m'],'icds':[['i10'],['i20','i30'],['i40']],'med':[[1,2,4,5],[3,4,6],[5,6,7]]})

Which machine learning algorithm shall I use for this type of data? I think of the inconsistent length of arrays in the med column. Whenever I try to pass it in the random forest classifier, med column is basically the labels.

Answer 1

Yeah, you are right, the algorithm you should use is RF or logistic also should be good. The issue is with the inconsistent length of data in 'med' column. If its not necessary you can use the following functions to average/sum out the numerical data in med columns arrays:

def sum_out(x):
return np.nansum(x)

def avg_out(x): return np.nanmean(x)

data=pd.DataFrame({'gender':['m','f','m'],'icds':[['i10'],['i20','i30'],['i40']],'med':[[1,2,4,5],[3,4,6],[5,6,7]]})

data['med_sum']= data['med'].map(sum_out) data['med_avg']= data['med'].map(avg_out)

Answer 2

You can actually add those meds as features, something like this:

data=pd.DataFrame({'gender':['m','f','m'],'icds':[['i10'],['i20','i30'],['i40']],'med':[['xanex','isotopin'],['cz3','hicet','t-montair'],['t-montair','xanex']]}) 


all_med= list(np.unique(flatten(list(data['med'].values))))

for meds in all_med:
    med_list=[]
    for i in xrange(len(data)):
        d= data['med'][i]
        if meds in d:
            med_list.append(1)
        else:
            med_list.append(0)
    data[meds]=med_list

Output:

  gender        icds                      med  cz3  hicet  isotopin  \
0      m       [i10]        [xanex, isotopin]    0      0         1
1      f  [i20, i30]  [cz3, hicet, t-montair]    1      1         0
2      m       [i40]       [t-montair, xanex]    0      0         0

   t-montair  xanex
0          0      1
1          1      0
2          1      1

Machine learning random forest classifier

Question

2 answers

solution1
0 2018-07-20 09:28:35

solution2
0 2018-07-20 12:59:32

Machine learning random forest classifier

Question

2 answers

solution1 0 2018-07-20 09:28:35

solution2 0 2018-07-20 12:59:32

solution1
0 2018-07-20 09:28:35

solution2
0 2018-07-20 12:59:32