Pandas 迭代地追加来自多个 DataFrame 列的行值

Question

I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.我想迭代地将多列中的行值附加到基于组的新 DataFrame 中的新列。

My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.我的目标是为每个客户设置 1 行，其中 1 列用于客户 ID，1 列用于他们的时间线，列出每个事件的日期，然后是事件描述，所有日期和事件按时间顺序排列。

I have solved this with a series of dictionaries.我已经用一系列字典解决了这个问题。 I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.我正在寻找一种干净、优雅、熊猫风格的方式来实现这一点，因为此代码将频繁运行，对客户、事件等进行小的更改。

Example:例子：

import pandas as pd

df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'], 
                        'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
                           'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})

df_have['Date'] = pd.to_datetime(df_have['Date'])

df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have

df 我目前有

df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
                       'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
                                   ['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want

df 我想要

Answer 1

Steps:脚步：

1) Set Customer_ID to be the index axis as it would remain static throughout the operation. 1) 将Customer_ID设置为索引轴，因为它将在整个操作过程中保持静态。

2) stack so that Date and Event fall below one another. 2) stack以便Date和Event低于彼此。

3) Peform groupby wrt the index ( level=0 ) and convert the only column into list . 3）通过索引（ level=0 ）执行groupby并将唯一的列转换为list 。 Since we've stacked them in this sequence, they would appear alternatingly.由于我们已按此顺序堆叠它们，因此它们会交替出现。

# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)

df_have.set_index('Customer_ID').stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

To change the order in which sequence occurs inside the list :要更改序列在list出现的顺序：

df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

Pandas 迭代地追加来自多个 DataFrame 列的行值

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-02-01 17:50:57

Pandas 迭代地追加来自多个 DataFrame 列的行值

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-02-01 17:50:57

解决方案1
2 已采纳 2017-02-01 17:50:57