简体   繁体   English

Python Pandas DataFrame - 如何根据另一列(日期类型)中的部分匹配对 1 列中的值求和?

[英]Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

I have encountered some issues while processing my dataset using Pandas DataFrame.我在使用 Pandas DataFrame 处理我的数据集时遇到了一些问题。

Here is my dataset:这是我的数据集:

在此处输入图片说明

My data types are displayed below:我的数据类型显示如下:

在此处输入图片说明

My dataset is derived from:我的数据集来自:
MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D") MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D")

  1. I would like to sum all values in the "NUMBER OF PEOPLE" column for each month in the "DATE" column.我想对“日期”列中每个月的“人数”列中的所有值求和。 For example, all values in "NUMBER OF PEOPLE" column would be added as long as the value in the "DATE" column was "2020-01", "2020-02" ...例如,只要“日期”列中的值为“2020-01”、“2020-02”...
    However, I am stuck since I am unsure how to use the .groupby on partial match.但是,我被卡住了,因为我不确定如何在部分匹配中使用 .groupby。

  2. After 1) is completed, I am also trying to convert the values in the "DATE" column from YYYY-MM-DD to YYYY-MMM, like 2020-Jan. 1) 完成后,我还尝试将“日期”列中的值从 YYYY-MM-DD 转换为 YYYY-MMM,例如 2020-Jan。
    However, I am unsure if there is such a format.但是,我不确定是否有这样的格式。

Does anyone know how to resolve these issues?有谁知道如何解决这些问题?

Many thanks!非常感谢!

查看

s = df['NUMBER OF PEOPLE'].groupby(pd.to_datetime(df['DATE'])).dt.strftime('%Y-%b')).sum()

You can get an abbeviated month name using strftime('%b') but the month name will be all in lowercase:您可以使用 strftime('%b') 获得缩写的月份名称,月份名称将全部为小写:

df['group_time'] = df.date.apply(lambda x: x.strftime('%Y-%B')) 

If you need the first letter of the month in uppercase, you could do something like this:如果您需要大写月份的第一个字母,您可以执行以下操作:

df.group_date = df.group_date.apply(lambda x: f'{x[0:5]}{x[5].upper()}{x[6:]}'

# or in one step:

df['group_date']= df.date.apply(lambda x: x.strftime('%Y-%B')).apply(lambda x: f'{x[0:5]}
    ...: {x[5].upper()}{x[6:]}')   

Now you just need to .groupby and .sum():现在你只需要 .groupby 和 .sum():

result = df['NUMBER OF PEOPLE'].groupby(df.group_date).sum()

I did some tinkering around and found that this worked for me as well:我做了一些修补,发现这对我也有用:

在此处输入图片说明

在此处输入图片说明

Cheers all干杯

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - Pandas DF - 对与另一列中的条件匹配的列中的值求和 - Python - Pandas DF - sum values in a column that match a condition in another column 基于 Pandas DataFrame 中另一列的 Sum 列 - Sum column based on another column in Pandas DataFrame 如何基于Pandas DataFrame中另一列的多个唯一值来合并一列的总和? - How to combine the sum of one column based on multiple unique values of another column in Pandas DataFrame? 根据熊猫数据框中的另一列值计算值的总和? - Calculate the sum of values based on another column value in pandas dataframe? pandas DataFrame:根据另一列中的 boolean 值计算 Sum - pandas DataFrame: Calculate Sum based on boolean values in another column 列(收入)的总和值基于:pandas 中另一列(日期)的值和另一列(用户 ID)的值 - Sum values of a column (Revenue) based on: the values of another column (Date) AND the value of another column (UserId) in pandas 如何根据另一列的日期条件获取熊猫数据框中特定列的值? - How do I get the values of a particular column in a pandas dataframe based on a date condition on another column? Python Pandas - 根据要匹配到列名的另一个表的值切片 DataFrame - Python Pandas - Slice DataFrame based on Another Table's Values to Match to Column Name 如何在特定日期范围内对熊猫列DataFrame中的某些值求和 - How to sum certain values in a pandas column DataFrame in a specific date range 如何根据另一列的列完全匹配和部分匹配从 dataframe 中删除行? - How do I remove a rows from a dataframe based on column full match and partial match of another column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM