df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[1,0,0,1,0,1],'col_c':[1,0,0,1,0,1]}) df col_a col_b col_c 0 0 1 1 1 0 0 0 2 0 0 0 3 1 1 1 4 1 0 0 5 1 1 1
i want to add a new feature to this df
,based on (presudocode) if numbers(1) in a row are majority in this row
,just like a voter
. i have tried for
on every column, but the orginal data`s rows are 10000, it takes about several mintutes( i think if use pandas
api, it would be faster). i have tried apply
or assign
, but it fails because of the unfamiliarity to the pandas
package. i want to learn it using pandas api,thank you all
You can use mode
:
df['col_d'] = df.mode(axis=1) print(df) # Output col_a col_b col_c col_d 0 0 1 1 1 1 0 0 0 0 2 0 0 0 0 3 1 1 1 1 4 1 0 0 0 5 1 1 1 1
You can sum on columns, if the result is greater than 1, it means 1 is majority
import numpy as np df['feature'] = np.where(df.sum(axis=1).ge(2), '1 majority', '0 majority')
print(df) col_a col_b col_c feature 0 0 1 1 1 majority 1 0 0 0 0 majority 2 0 0 0 0 majority 3 1 1 1 1 majority 4 1 0 0 0 majority 5 1 1 1 1 majority
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.