简体   繁体   中英

How can I combine several elements in columns with all other columns stay the same?

I need help with combining language columns into one row, and then drop duplicate columns, just combine two different language of the same Movie , year , and Id .

There are more similar columns in the CSV, so please help me figure out a way to combine those.Here is the existing csv:

f = pd.DataFrame({'Movie': ['name1','name1','name2','name3','name4','name4'],
                  'Year': ['1905', '1905','1906','1907','1910','1910'],
                  'Id': ['tt0283985', 'tt0283985','tt0284043','tt3402904','tt3458360','tt3458360'],
                  'language':['Mandarian','Cantonese','Mandarian','unknown','Cantonese','Mandarian']})

Where f now looks like:

   Movie  Year         Id   language
0  name1  1905  tt0283985  Mandarian
1  name1  1905  tt0283985  Cantonese
2  name2  1906  tt0284043  Mandarian
3  name3  1907  tt3402904    unknown
4  name4  1910  tt3458360  Cantonese
5  name4  1910  tt3458360  Mandarian

And the result should be like this:

   Movie  Year         Id             language
0  name1  1905  tt0283985  Mandarian,Cantonese
1  name2  1906  tt0284043            Mandarian
2  name3  1907  tt3402904              unknown
3  name4  1910  tt3458360  Cantonese,Mandarian

So row 1 and 2 are identical except language, we just need to combine them into one row, so does row 5 and 6. Here's my try:

ff = new_f.groupby(by = ['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv", index=False)

But the output is weird: All other columns disappeared and language is the only column left:

language
Mandarian,Cantonese
Mandarian
unknown
Cantonese,Mandarian

...

By default, groupby sets the grouping keys as index, and you explicitly asked to_csv not to export the index,

Use as_index=False in groupby :

ff = f.groupby(by=['Movie','Year','Id'], as_index=False).agg(','.join)
ff.to_csv("File.csv", index=False)

Or, export the index in to_csv :

ff = f.groupby(by=['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv")

NB. if the intermediate is not useful to you, you do not need to set the ff variable, you can directly chain f.groupby(...).agg(...).to_csv(...)

Use the following command: ff = f.groupby(['Movie','Year','Id']).agg(','.join).reset_index()

It should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM