简体   繁体   中英

compare and delete columns on a dataframe

check here to see the df picture

Python:

I have a dataframe where some genres colums are duplicated. I would like to go mix the columns with similar genres and if they have "1" value keep that value.

For example 0genero_adventure has a "0" value and 1genero_adventure has a "1" value, so I´d like to keep the "1".

Not only for these example fut for the whole table(which continues with more duplicated genres columns)

Thanks in advance:)

I would store the genres, loop through them and if one of the columns is 1 then keep 1 else 0.

genres = ["action", "adventure"....]
for col in genres:
    df[col] = np.where(df["0genero_"+col]==1 or df["1genero_"+col]==1, 1, 0]

Drop the rest of the columns you don't need

If I understood your problem correctly, I think the below code should work for you perfectly. However one requirement would be for you to create a list with the name of the genres.

genre_list = ["genero_Adventure", "genero_Biography", "genero_Comedy"]  #Add all the genre names like this

Then this loop should do your job:

for genre in genre_list:
   genre_cols_list = []
   genre_cols_list = [col for col in df.columns if genre in col]    #Creates a list containing all the columns with the genre name

   df[genre] = df[genre_cols_list].max(axis= 1)   #Checks if there is a value of 1 at the row level and stores it in a new column with just the genre name
   df.drop(columns = genre_cols_list, axis = 1, inplace = True)   #Deletes all columns with the genre name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM