简体   繁体   中英

how can i add one feature for dataframe based on complex condition?

 df = pd.DataFrame({'col_a':[0,0,0,1,1,1], 'col_b':[1,0,0,1,0,1],'col_c':[1,0,0,1,0,1]}) df col_a col_b col_c 0 0 1 1 1 0 0 0 2 0 0 0 3 1 1 1 4 1 0 0 5 1 1 1

i want to add a new feature to this df ,based on (presudocode) if numbers(1) in a row are majority in this row ,just like a voter . i have tried for on every column, but the orginal data`s rows are 10000, it takes about several mintutes( i think if use pandas api, it would be faster). i have tried apply or assign , but it fails because of the unfamiliarity to the pandas package. i want to learn it using pandas api,thank you all

You can use mode :

 df['col_d'] = df.mode(axis=1) print(df) # Output col_a col_b col_c col_d 0 0 1 1 1 1 0 0 0 0 2 0 0 0 0 3 1 1 1 1 4 1 0 0 0 5 1 1 1 1

You can sum on columns, if the result is greater than 1, it means 1 is majority

import numpy as np df['feature'] = np.where(df.sum(axis=1).ge(2), '1 majority', '0 majority')
 print(df) col_a col_b col_c feature 0 0 1 1 1 majority 1 0 0 0 0 majority 2 0 0 0 0 majority 3 1 1 1 1 majority 4 1 0 0 0 majority 5 1 1 1 1 majority

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM