简体   繁体   中英

Group by in Python Pandas (Multiple columns join with , )

I have a table in CSV just like this:

Base CSV

And i need to group it just like this:

In all my CONCURSO only CIDADE and UF change.

Expectd CSV

i'm trying this code but it doesn't work.

Can you guys help me, please?

import...

    new_df = pd.read_csv(fr'C:\Users\anton\Desktop\Anon\data\swamp\{date}\nao_tratado.csv')
    new_df = new_df.groupby(by=['Concurso'], as_index=False).agg(','.join)
    new_df = pd.concat([new_df]).to_csv(fr'C:\Users\anton\Desktop\Anon\data\lake\{date}\tratado.csv', index=False)
    print('We are done.')

The agg() method of Pandas can take a dictionary for the func parameter. This dict maps the column and its aggregation function.

I guess you can then do the following:

columns_to_aggregate = ["Cidade", "UF"]
columns_for_groupby = ["Concurso"]
columns = list(set(new_df.columns).difference(columns_for_groupby))
aggregation_func = {c: (lambda x: ", ".join(map(str, x))) if c in columns_to_aggregate else "min" for c in columns}
new_df = new_df.groupby(by=columns_for_groupby, as_index=False).agg(aggregation_func)
new_df.to_csv(fr'C:\Users\anton\Desktop\Anon\data\lake\{date}\tratado.csv', index=False)

Tell me if it does not work:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM