So I am operating on a rather large set of data. I am usign Pandas DataFrame to handle this data and am stuck on an efficient way to parse the data into two formatted lists
HERE IS MY DATAFRAME OBJECT
fet1 fet2 fet3 fet4 fet5
stim1 True True False False False
stim2 True False False False True
stim3 ...................................
stim4 ...................................
stim5 ............................. so on
I am trying to parse each row and create two lists. List one should have the column name of all the true values. List two should have the column names of the false values.
example for stim 1:
list_1=[fet1,fet2]
list_2=[fet3,fet4,fet5]
I know I can brute force this approach and Iterate over the rows. Or I can transpose and convert to a dictionary and Parse that Way. I can also create Sparse Series objects and then create sets but then have to reference the column names separately.
The only problem I am running into is that I am always getting Quadratic O(n^2) run time.
Is there a more efficient way to do this as a built in functionality from Pandas?
Thanks for your help.
Is this what you want?
>>> df
fet1 fet2 fet3 fet4 fet5
stim1 True True False False False
stim2 True False False False True
>>> def func(row):
return [
row.index[row == True],
row.index[row == False]
]
>>> df.apply(func, axis=1)
stim1 [[fet1, fet2], [fet3, fet4, fet5]]
stim2 [[fet1, fet5], [fet2, fet3, fet4]]
dtype: object
This may or may not be faster. I do not think a more succinct solution is possible.
Fast (not row-by-row) operations can get this far.
In [126]: (np.array(df.columns)*~df)[~df]
Out[126]:
fet1 fet2 fet3 fet4 fet5
stim1 NaN NaN fet3 fet4 fet5
stim2 NaN fet2 fet3 fet4 NaN
But at this point, because the rows might have variable length, the array structure must be broken and each row must be considered individually.
In [122]: (np.array(df.columns)*df)[df].apply(lambda x: Series([x.dropna()]), 1)
Out[122]:
0
stim1 [fet1, fet2]
stim2 [fet1, fet5]
In [125]: (np.array(df.columns)*~df)[~df].apply(lambda x: Series([x.dropna()]), 1)
Out[125]:
0
stim1 [fet3, fet4, fet5]
stim2 [fet2, fet3, fet4]
The slowest step is probably the Series constructor. I'm pretty sure there's no way around it though.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.