简体   繁体   English

需要帮助计算带有时间线的 Pandas 数据框头部下的字符串

[英]Need help in counting strings under heads of Pandas data frame with timelines

I am looking to count the frequency of each string under different heads in Pandas dataframe using Pandas pivot for data analysis with monthly trends.我希望使用 Pandas 数据透视表计算每个字符串在 Pandas 数据框中不同头部的频率,以进行每月趋势的数据分析。 The data looks like below,数据如下所示,

name    age  city        Date                  country   hight  MessageList  gender
Tom     10   NewYork       1/1/2021 08:35:58Z  US        NaN      X List      Male
Mark     5   London        5/1/2021 08:35:58Z  UK        NaN      X List      Male
Pam      7   London        3/6/2021 08:35:58Z  UK        NaN      Y List      Female
Tom     18   California    4/6/2021 08:35:58Z  US        163      Y List      Male
Lena    23   NewYork     12/12/2020 08:35:58Z  US        NaN      Y List      Female
Ben     17   Colombo     11/12/2020 08:35:58Z  Srilanka  NaN      X List      Male
Lena    23   Paris         8/1/2020 08:35:58Z  France    NaN      Y List      Female
Ben     51   Colombo       7/1/2020 08:35:58Z  Srilanka  NaN      Z List      Male
Tom     18   Paris         1/1/2021 08:35:58Z  France    NaN      Z List      Male
Mark     5   Paris         5/1/2021 08:35:58Z  Japan     NaN      Z List      Male
Tom     18   London        3/6/2021 08:35:58Z  UK        NaN      X List      Male
Tom     18   Paris         4/6/2021 08:35:58Z  France    163      Z List      Male

import pandas as pd
import numpy as np
table = pd.pivot_table(df, values='name', index=['name', 'city'],
                       aggfunc=np.count_nonzero())

I am new to Pandas and am struggling to get the string count with the monthly trend.我是 Pandas 的新手,正在努力根据每月趋势获得字符串数。

I am looking for output like this,我正在寻找这样的输出,

            2020         2021
Name        Nov   Dec    Jan    Feb
Tom
 Paris      3     1      2      3
 Colombo    2     3             3
 London     4     1      4      2
Mark
 Colombo    1            3      1
 London     3     3      2      2
Pam
 California 3     1             1
 NewYork    1            4      2
Len
 London     1     2      2      1

Use crosstab with months periods by Series.dt.to_period , so possible create MultiIndex in ouput by PeriodIndex.year with PeriodIndex.strftime :使用crosstab与几个月时间Series.dt.to_period ,所以可以创建MultiIndex由输出中PeriodIndex.yearPeriodIndex.strftime

df['Date'] = pd.to_datetime(df['Date'])
table = pd.crosstab([df['name'], df['city']], df['Date'].dt.to_period('m'))

table.columns = [table.columns.year, table.columns.strftime('%b')]

print (table)
Date            2020             2021            
Date             Jul Aug Nov Dec  Jan Mar Apr May
name city                                        
Ben  Colombo       1   0   1   0    0   0   0   0
Lena NewYork       0   0   0   1    0   0   0   0
     Paris         0   1   0   0    0   0   0   0
Mark London        0   0   0   0    0   0   0   1
     Paris         0   0   0   0    0   0   0   1
Pam  London        0   0   0   0    0   1   0   0
Tom  California    0   0   0   0    0   0   1   0
     London        0   0   0   0    0   1   0   0
     NewYork       0   0   0   0    1   0   0   0
     Paris         0   0   0   0    1   0   1   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM