如何从 pandas 数据帧中的逗号分隔值计算以特定 substring 开头的字符串的出现次数？

Question

I am new to Python.我是 Python 的新手。 I am working with a dataframe (360000 rows and 2 columns) that looks something like this: business_id date我正在使用看起来像这样的 dataframe（360000 行和 2 列）：business_id date

P01         2019-07-6 , 2018-06-05, 2019-07-06...
P02         2016-03-6 , 2019-04-10
P03         2019-01-02

The date column has dates separated by comma and dates from year 2010-2019.日期列包含用逗号分隔的日期和 2010-2019 年的日期。 I am trying to count only the dates for each month that are in year 2019 for each business id.我试图仅计算每个企业 ID 的 2019 年每个月的日期。 Specifically, I am looking for the output:具体来说，我正在寻找 output：

Can anyone please help me?谁能帮帮我吗？ Thanks.谢谢。

Answer 1

You can do as follows您可以执行以下操作

first use str.split to separate the dates in each cell to a list,首先使用str.split将每个单元格中的日期分隔到一个列表中，
then explode to flatten the lists然后explode以展平列表
convert to datetime with pd.to_datetime and extract the month使用pd.to_datetime转换为日期时间并提取月份
finally use pd.crosstab to pivot/count the months and join.最后使用pd.crosstab来透视/计算月份并加入。

Altogether:共：

s = pd.to_datetime(df['date'].str.split('\s*,\s*').explode()).dt.to_period('M')

out = pd.crosstab(s.index,s )

# this gives the expected output
df.join(out)

Output ( out ): Output（ out ）：

date   2016-03  2018-06  2019-01  2019-04  2019-07
row_0                                             
0            0        1        0        0        2
1            1        0        0        1        0
2            0        0        1        0        0

Answer 2

If they are not datetime objects yet, you may want to start by converting the column (series) to datetime: pd.to_datetime() Note: the format parameter.如果它们还不是日期时间对象，您可能希望首先将列（系列）转换为日期时间： pd.to_datetime()注意： format参数。

Then you can access the datetime attributes through .dt然后您可以通过.dt访问日期时间属性

ie df[df.COLUMN_NAME.dt.month == 5]即df[df.COLUMN_NAME.dt.month == 5]

如何从 pandas 数据帧中的逗号分隔值计算以特定 substring 开头的字符串的出现次数？

问题描述

2 个解决方案

解决方案1
1 2021-02-03 03:23:40

解决方案2
0 2021-02-03 03:17:17

如何从 pandas 数据帧中的逗号分隔值计算以特定 substring 开头的字符串的出现次数？

问题描述

2 个解决方案

解决方案1 1 2021-02-03 03:23:40

解决方案2 0 2021-02-03 03:17:17

解决方案1
1 2021-02-03 03:23:40

解决方案2
0 2021-02-03 03:17:17