简体   繁体   English

连接 groupedBy pandas 数据帧的字符串

[英]Concatenate strings of a groupedBy pandas dataframe

From ang SQL query, I got a DataFrame similar to this one:从 ang SQL 查询中,我得到了一个类似于这个的 DataFrame:

df = pd.DataFrame([
        ['ABC', 'Order'],
        ['ABC', 'Address'],
        ['ABC', 'Zip'],
        ['XYZ', 'Customer'],
        ['XYZ', 'Name']
    ],
    columns=("Table", "Column"))
  Table    Column
0   ABC     Order
1   ABC   Address
2   ABC       Zip
3   XYZ  Customer
4   XYZ      Name

I am trying to save info in a separate file, like:我正在尝试将信息保存在一个单独的文件中,例如:

Table ABC has columns: Order, Address, Zip表 ABC 有列:订单、地址、邮编

One line for each table (and only once).每个表一行(并且只有一次)。

How I can achieve this?我怎么能做到这一点?

I already tried:我已经尝试过:

for table_name in df.TABLE_NAME:
  output = "Table" + Table_name + "are" + (df.iloc[:,2])

But I am not getting any desired output.但我没有得到任何想要的输出。

Making some string manipulation while grouping by your Table name can give you what you expect.在按Table名分组时进行一些字符串操作可以满足您的期望。

import pandas as pd

if __name__ == '__main__':
    df = pd.DataFrame([
        ['ABC', 'Order'],
        ['ABC', 'Address'],
        ['ABC', 'Zip'],
        ['XYZ', 'Customer'],
        ['XYZ', 'Name']
    ],
    columns=("Table", "Column"))

    pretty = pd.concat(
        (df['Table'],
        df.groupby("Table")['Column'].transform(lambda x: ", ".join(x))),
        axis=1
    ).drop_duplicates()

    for _, row in pretty.iterrows():
        print("Table '{}' has columns: {}".format(row['Table'], row['Column']))
Table 'ABC' has columns: Order, Address, Zip
Table 'XYZ' has columns: Customer, Name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM