Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

Question

I have encountered some issues while processing my dataset using Pandas DataFrame.

Here is my dataset:

My data types are displayed below:

My dataset is derived from:
MY_DATASET = pd.read_excel(EXCEL_FILE_PATH, index_col = None, na_values = ['NA'], usecols = "A, D")

I would like to sum all values in the "NUMBER OF PEOPLE" column for each month in the "DATE" column. For example, all values in "NUMBER OF PEOPLE" column would be added as long as the value in the "DATE" column was "2020-01", "2020-02" ...
However, I am stuck since I am unsure how to use the .groupby on partial match.
After 1) is completed, I am also trying to convert the values in the "DATE" column from YYYY-MM-DD to YYYY-MMM, like 2020-Jan.
However, I am unsure if there is such a format.

Does anyone know how to resolve these issues?

Many thanks!

Answer 1

查看

s = df['NUMBER OF PEOPLE'].groupby(pd.to_datetime(df['DATE'])).dt.strftime('%Y-%b')).sum()

Answer 2

You can get an abbeviated month name using strftime('%b') but the month name will be all in lowercase:

df['group_time'] = df.date.apply(lambda x: x.strftime('%Y-%B'))

If you need the first letter of the month in uppercase, you could do something like this:

df.group_date = df.group_date.apply(lambda x: f'{x[0:5]}{x[5].upper()}{x[6:]}'

# or in one step:

df['group_date']= df.date.apply(lambda x: x.strftime('%Y-%B')).apply(lambda x: f'{x[0:5]}
    ...: {x[5].upper()}{x[6:]}')

Now you just need to .groupby and .sum():

result = df['NUMBER OF PEOPLE'].groupby(df.group_date).sum()

Answer 3

I did some tinkering around and found that this worked for me as well:

Cheers all

Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

Question

3 answers

solution1
2 ACCPTED 2020-08-29 14:57:51

solution2
2 2020-08-29 16:09:27

solution3
0 2020-08-30 03:58:44

Python Pandas DataFrame - How to sum values in 1 column based on partial match in another column (date type)?

Question

3 answers

solution1 2 ACCPTED 2020-08-29 14:57:51

solution2 2 2020-08-29 16:09:27

solution3 0 2020-08-30 03:58:44

solution1
2 ACCPTED 2020-08-29 14:57:51

solution2
2 2020-08-29 16:09:27

solution3
0 2020-08-30 03:58:44