需要帮助计算带有时间线的 Pandas 数据框头部下的字符串

Question

I am looking to count the frequency of each string under different heads in Pandas dataframe using Pandas pivot for data analysis with monthly trends.我希望使用 Pandas 数据透视表计算每个字符串在 Pandas 数据框中不同头部的频率，以进行每月趋势的数据分析。 The data looks like below,数据如下所示，

name    age  city        Date                  country   hight  MessageList  gender
Tom     10   NewYork       1/1/2021 08:35:58Z  US        NaN      X List      Male
Mark     5   London        5/1/2021 08:35:58Z  UK        NaN      X List      Male
Pam      7   London        3/6/2021 08:35:58Z  UK        NaN      Y List      Female
Tom     18   California    4/6/2021 08:35:58Z  US        163      Y List      Male
Lena    23   NewYork     12/12/2020 08:35:58Z  US        NaN      Y List      Female
Ben     17   Colombo     11/12/2020 08:35:58Z  Srilanka  NaN      X List      Male
Lena    23   Paris         8/1/2020 08:35:58Z  France    NaN      Y List      Female
Ben     51   Colombo       7/1/2020 08:35:58Z  Srilanka  NaN      Z List      Male
Tom     18   Paris         1/1/2021 08:35:58Z  France    NaN      Z List      Male
Mark     5   Paris         5/1/2021 08:35:58Z  Japan     NaN      Z List      Male
Tom     18   London        3/6/2021 08:35:58Z  UK        NaN      X List      Male
Tom     18   Paris         4/6/2021 08:35:58Z  France    163      Z List      Male

import pandas as pd
import numpy as np
table = pd.pivot_table(df, values='name', index=['name', 'city'],
                       aggfunc=np.count_nonzero())

I am new to Pandas and am struggling to get the string count with the monthly trend.我是 Pandas 的新手，正在努力根据每月趋势获得字符串数。

I am looking for output like this,我正在寻找这样的输出，

            2020         2021
Name        Nov   Dec    Jan    Feb
Tom
 Paris      3     1      2      3
 Colombo    2     3             3
 London     4     1      4      2
Mark
 Colombo    1            3      1
 London     3     3      2      2
Pam
 California 3     1             1
 NewYork    1            4      2
Len
 London     1     2      2      1

Answer 1

Use crosstab with months periods by Series.dt.to_period , so possible create MultiIndex in ouput by PeriodIndex.year with PeriodIndex.strftime :使用crosstab与几个月时间Series.dt.to_period ，所以可以创建MultiIndex由输出中PeriodIndex.year与PeriodIndex.strftime ：

df['Date'] = pd.to_datetime(df['Date'])
table = pd.crosstab([df['name'], df['city']], df['Date'].dt.to_period('m'))

table.columns = [table.columns.year, table.columns.strftime('%b')]

print (table)
Date            2020             2021            
Date             Jul Aug Nov Dec  Jan Mar Apr May
name city                                        
Ben  Colombo       1   0   1   0    0   0   0   0
Lena NewYork       0   0   0   1    0   0   0   0
     Paris         0   1   0   0    0   0   0   0
Mark London        0   0   0   0    0   0   0   1
     Paris         0   0   0   0    0   0   0   1
Pam  London        0   0   0   0    0   1   0   0
Tom  California    0   0   0   0    0   0   1   0
     London        0   0   0   0    0   1   0   0
     NewYork       0   0   0   0    1   0   0   0
     Paris         0   0   0   0    1   0   1   0

需要帮助计算带有时间线的 Pandas 数据框头部下的字符串

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-07-16 09:25:47

需要帮助计算带有时间线的 Pandas 数据框头部下的字符串

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-07-16 09:25:47

解决方案1
2 已采纳 2021-07-16 09:25:47