简体   繁体   中英

Concatenating multiple string columns in pandas dataframe

I'm trying to join 4 columsn of a dataframe, each has a list of values which need to be joined together:

The working code is as follows:

def create_soup(x):
    return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
df['soup'] = df.apply(create_soup, axis=1)

My main issue with understanding this code is that df.apply function works on a row of data here, why can I not use this same code with the complete dataframe in one go.

Is there any method to directly do this without the apply function?

The data is as follows:

在此处输入图像描述

The final line contains the output of the first movie - cast + director + keywords + genres

Use Series.str.join :

df['soup'] = (df['keywords'].str.join(' ') + ' ' + 
              df['cast'].str.join(' ') + ' ' + 
              df['director'] + ' ' +
              df['genres'].str.join(' '))

Similar:

df['soup'] = ((df['keywords'] + df['cast']).str.join(' ') + ' ' + 
               df['director'] + ' ' +
               df['genres'].str.join(' '))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM