简体   繁体   中英

Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

I have encountered some issues while processing my dataset using Pandas DataFrame.

Here is my dataset:

在此处输入图片说明

My data types are displayed below:

在此处输入图片说明

My dataset is derived from:
MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D")

  1. I would like to sum all values in the "NUMBER OF PEOPLE" column for each month in the "DATE" column. For example, all values in "NUMBER OF PEOPLE" column would be added as long as the value in the "DATE" column was "2020-01", "2020-02" ...
    However, I am stuck since I am unsure how to use the .groupby on partial match.

  2. After 1) is completed, I am also trying to convert the values in the "DATE" column from YYYY-MM-DD to YYYY-MMM, like 2020-Jan.
    However, I am unsure if there is such a format.

Does anyone know how to resolve these issues?

Many thanks!

查看

s = df['NUMBER OF PEOPLE'].groupby(pd.to_datetime(df['DATE'])).dt.strftime('%Y-%b')).sum()

You can get an abbeviated month name using strftime('%b') but the month name will be all in lowercase:

df['group_time'] = df.date.apply(lambda x: x.strftime('%Y-%B')) 

If you need the first letter of the month in uppercase, you could do something like this:

df.group_date = df.group_date.apply(lambda x: f'{x[0:5]}{x[5].upper()}{x[6:]}'

# or in one step:

df['group_date']= df.date.apply(lambda x: x.strftime('%Y-%B')).apply(lambda x: f'{x[0:5]}
    ...: {x[5].upper()}{x[6:]}')   

Now you just need to .groupby and .sum():

result = df['NUMBER OF PEOPLE'].groupby(df.group_date).sum()

I did some tinkering around and found that this worked for me as well:

在此处输入图片说明

在此处输入图片说明

Cheers all

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM