I'm trying to join 4 columsn of a dataframe, each has a list of values which need to be joined together:
The working code is as follows:
def create_soup(x):
return ' '.join(x['keywords']) + ' ' + ' '.join(x['cast']) + ' ' + x['director'] + ' ' + ' '.join(x['genres'])
df['soup'] = df.apply(create_soup, axis=1)
My main issue with understanding this code is that df.apply
function works on a row of data here, why can I not use this same code with the complete dataframe in one go.
Is there any method to directly do this without the apply function?
The data is as follows:
The final line contains the output of the first movie - cast + director + keywords + genres
Use Series.str.join
:
df['soup'] = (df['keywords'].str.join(' ') + ' ' +
df['cast'].str.join(' ') + ' ' +
df['director'] + ' ' +
df['genres'].str.join(' '))
Similar:
df['soup'] = ((df['keywords'] + df['cast']).str.join(' ') + ' ' +
df['director'] + ' ' +
df['genres'].str.join(' '))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.