[英]Pandas iteratively append row values from multiple DataFrame columns
I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.我想迭代地将多列中的行值附加到基于组的新 DataFrame 中的新列。
My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.我的目标是为每个客户设置 1 行,其中 1 列用于客户 ID,1 列用于他们的时间线,列出每个事件的日期,然后是事件描述,所有日期和事件按时间顺序排列。
I have solved this with a series of dictionaries.我已经用一系列字典解决了这个问题。 I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.
我正在寻找一种干净、优雅、熊猫风格的方式来实现这一点,因为此代码将频繁运行,对客户、事件等进行小的更改。
Example:例子:
import pandas as pd
df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'],
'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})
df_have['Date'] = pd.to_datetime(df_have['Date'])
df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have
df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want
Steps:脚步:
1) Set Customer_ID
to be the index axis as it would remain static throughout the operation. 1) 将
Customer_ID
设置为索引轴,因为它将在整个操作过程中保持静态。
2) stack
so that Date
and Event
fall below one another. 2)
stack
以便Date
和Event
低于彼此。
3) Peform groupby
wrt the index ( level=0
) and convert the only column into list
. 3)通过索引(
level=0
)执行groupby
并将唯一的列转换为list
。 Since we've stacked them in this sequence, they would appear alternatingly.由于我们已按此顺序堆叠它们,因此它们会交替出现。
# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)
df_have.set_index('Customer_ID').stack(
).groupby(level=0).apply(list).reset_index(name="Time_Line")
To change the order in which sequence occurs inside the list
:要更改序列在
list
出现的顺序:
df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
).groupby(level=0).apply(list).reset_index(name="Time_Line")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.