How can I combine several elements in columns with all other columns stay the same?

Question

I need help with combining language columns into one row, and then drop duplicate columns, just combine two different language of the same Movie , year , and Id .

There are more similar columns in the CSV, so please help me figure out a way to combine those.Here is the existing csv:

f = pd.DataFrame({'Movie': ['name1','name1','name2','name3','name4','name4'],
                  'Year': ['1905', '1905','1906','1907','1910','1910'],
                  'Id': ['tt0283985', 'tt0283985','tt0284043','tt3402904','tt3458360','tt3458360'],
                  'language':['Mandarian','Cantonese','Mandarian','unknown','Cantonese','Mandarian']})

Where f now looks like:

   Movie  Year         Id   language
0  name1  1905  tt0283985  Mandarian
1  name1  1905  tt0283985  Cantonese
2  name2  1906  tt0284043  Mandarian
3  name3  1907  tt3402904    unknown
4  name4  1910  tt3458360  Cantonese
5  name4  1910  tt3458360  Mandarian

And the result should be like this:

   Movie  Year         Id             language
0  name1  1905  tt0283985  Mandarian,Cantonese
1  name2  1906  tt0284043            Mandarian
2  name3  1907  tt3402904              unknown
3  name4  1910  tt3458360  Cantonese,Mandarian

So row 1 and 2 are identical except language, we just need to combine them into one row, so does row 5 and 6. Here's my try:

ff = new_f.groupby(by = ['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv", index=False)

But the output is weird: All other columns disappeared and language is the only column left:

language
Mandarian,Cantonese
Mandarian
unknown
Cantonese,Mandarian

...

Answer 1

By default, groupby sets the grouping keys as index, and you explicitly asked to_csv not to export the index,

Use as_index=False in groupby :

ff = f.groupby(by=['Movie','Year','Id'], as_index=False).agg(','.join)
ff.to_csv("File.csv", index=False)

Or, export the index in to_csv :

ff = f.groupby(by=['Movie','Year','Id']).agg(','.join)
ff.to_csv("File.csv")

NB. if the intermediate is not useful to you, you do not need to set the ff variable, you can directly chain f.groupby(...).agg(...).to_csv(...)

Answer 2

Use the following command: ff = f.groupby(['Movie','Year','Id']).agg(','.join).reset_index()

It should work.

How can I combine several elements in columns with all other columns stay the same?

Question

2 answers

solution1
1 ACCPTED 2022-03-29 02:48:17

solution2
0 2022-03-29 03:34:57

How can I combine several elements in columns with all other columns stay the same?

Question

2 answers

solution1 1 ACCPTED 2022-03-29 02:48:17

solution2 0 2022-03-29 03:34:57

solution1
1 ACCPTED 2022-03-29 02:48:17

solution2
0 2022-03-29 03:34:57