[英]Comparing values of same column depending on values in different column from the same pandas dataframe
Here is my data frame:这是我的数据框:
month月 | year年 | category类别 | size尺寸 | sold卖 |
---|---|---|---|---|
6 6 | 2022 2022 | shirt衬衫 | M米 | 52 52 |
5 5 | 2022 2022 | shirt衬衫 | M米 | 45 45 |
1 1 | 2022 2022 | shirt衬衫 | S小号 | 61 61 |
12 12 | 2021 2021 | shirt衬衫 | S小号 | 89 89 |
12 12 | 2021 2021 | pant喘气 | S小号 | 72 72 |
7 7 | 2022 2022 | shirt衬衫 | M米 | 42 42 |
8 8 | 2022 2022 | shirt衬衫 | M米 | 55 55 |
8 8 | 2022 2022 | pants裤子 | 41 41 | 9 9 |
What I would like is to roll up previous month to another column:我想要的是将上个月汇总到另一列:
Like this:像这样:
current_month_year current_month_year | previous_month_year previous_month_year | category类别 | size尺寸 | sold_current已售出当前 | sold_previous已售出_previous |
---|---|---|---|---|---|
6-2022 6-2022 | 5-2022 5-2022 | shirt衬衫 | M米 | 52 52 | 45 45 |
1-2022 1-2022 | 12-2021 12-2021 | shirt衬衫 | S小号 | 61 61 | 89 89 |
12-2021 12-2021 | pant喘气 | S小号 | 72 72 | 0 0 | |
8-2022 8-2022 | 7-2022 7-2022 | shirt衬衫 | M米 | 55 55 | 42 42 |
8-2022 8-2022 | pant喘气 | 41 41 | 9 9 | 0 0 |
how would I do this?我该怎么做?
I have no idea how to do this so don't have any code to show.我不知道如何做到这一点,所以没有任何代码可以显示。
Will start by creating the column current_month_year
based on the columns month
and year
将首先根据列month
和year
创建列current_month_year
df['current_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)
[Out]:
month year category size sold current_month_year
0 6 2022 shirt M 52 6-2022
1 5 2022 shirt M 45 5-2022
2 1 2022 shirt S 61 1-2022
3 12 2021 shirt S 89 12-2021
4 12 2021 pant S 72 12-2021
Then, let's now focus on the column previous_month_year
.然后,现在让我们关注previous_month_year
列。 The approach will be similar at first, but then we will add the conditions方法一开始是类似的,但随后我们将添加条件
df['previous_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)
[Out]:
month year category size sold current_month_year previous_month_year
0 6 2022 shirt M 52 6-2022 6-2022
1 5 2022 shirt M 45 5-2022 5-2022
2 1 2022 shirt S 61 1-2022 1-2022
3 12 2021 shirt S 89 12-2021 12-2021
4 12 2021 pant S 72 12-2021 12-2021
Then let's add the conditions:然后让我们添加条件:
• If month
is 1
(January), the previous month is 12
(December) with less 1
year
• 如果month
为1
(1 月),则上个月为12
(12 月),减去1
year
df.loc[df['month'] == 1, 'previous_month_year'] = '12-' + (df['year'] - 1).astype(str)
[Out]:
month year category size sold current_month_year previous_month_year
0 6 2022 shirt M 52 6-2022 6-2022
1 5 2022 shirt M 45 5-2022 5-2022
2 1 2022 shirt S 61 1-2022 12-2021
3 12 2021 shirt S 89 12-2021 12-2021
4 12 2021 pant S 72 12-2021 12-2021
• If month
is different than 1
(January), then subtract 1
to the month
and the year
• 如果month
不是1
(一月),则将month
和year
减去1
df.loc[df['month'] != 1, 'previous_month_year'] = (df['month'] - 1).astype(str) + '-' + df['year'].astype(str)
[Out]:
month year category size sold current_month_year previous_month_year
0 6 2022 shirt M 52 6-2022 5-2022
1 5 2022 shirt M 45 5-2022 4-2022
2 1 2022 shirt S 61 1-2022 12-2021
3 12 2021 shirt S 89 12-2021 11-2021
4 12 2021 pant S 72 12-2021 11-2021
5 7 2022 shirt M 42 7-2022 6-2022
6 8 2022 shirt M 55 8-2022 7-2022
7 8 2022 pants 41 9 8-2022 7-2022
Finally, as OP wants to have the column sold_previous
to show the number of sold
items in the previous_month_year
for a given current_month_year
, the following will do the work (if there isn't previous current_month_year
, one will use the value 0):最后,由于 OP 希望列sold_previous
来显示给定current_month_year
在previous_month_year
中sold
的商品数量,因此以下将完成工作(如果没有先前的current_month_year
,将使用值 0):
df['sold_previous'] = 0
for i in df.index:
if df.loc[i, 'current_month_year'] in df['previous_month_year'].values:
df.loc[i, 'sold_previous'] = df.loc[df['previous_month_year'] == df.loc[i, 'current_month_year'], 'sold'].values[0]
else:
df.loc[i, 'sold_previous'] = 0
[Out]:
month year category ... current_month_year previous_month_year sold_previous
0 6 2022 shirt ... 6-2022 5-2022 42
1 5 2022 shirt ... 5-2022 4-2022 52
2 1 2022 shirt ... 1-2022 12-2021 0
3 12 2021 shirt ... 12-2021 11-2021 61
4 12 2021 pant ... 12-2021 11-2021 61
5 7 2022 shirt ... 7-2022 6-2022 55
6 8 2022 shirt ... 8-2022 7-2022 0
7 8 2022 pants ... 8-2022 7-2022 0
If one wants to change columns names, for example sold
to sold_current
, one can do the following如果想更改列名,例如sold
到sold_current
,可以执行以下操作
df['sold_current'] = df['sold']
You can create a new DataFrame with the output columns you want and iterate over the original DataFrame to filter the size and category in adjacent months to get the previous sales.您可以使用所需的 output 列创建新的 DataFrame 并迭代原始 DataFrame 以过滤相邻月份的大小和类别以获得先前的销售额。 The current sales and the rest of the rows are easy to add by simply copying from one DataFrame to the other and doing a small transformation to calculate the previous month.当前销售额和行的 rest 很容易添加,只需从一个 DataFrame 复制到另一个并进行小转换以计算上个月。
I would do something like this:我会做这样的事情:
df = pd.DataFrame({'month': [6, 5, 1, 12, 12, 7, 8, 8], 'year': [2022, 2022, 2022, 2021, 2021, 2022, 2022, 2022], 'category': ['shirt', 'shirt', 'shirt', 'shirt', 'pant', 'shirt', 'shirt', 'pants'], 'size': ['M', 'M', 'S', 'S', 'S', 'M', 'M', '41'], 'sold': [52, 45, 61, 89, 72, 42, 55, 9]})
df_new = pd.DataFrame(columns=['current_month_year', 'previous_month_year', 'category', 'size', 'sold_current', 'sold_previous'])
df_new['current_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)
for index, row in df.iterrows():
if row['month'] == 1:
df_new.loc[index, 'previous_month_year'] = '12-' + str(row['year'] - 1)
else:
df_new.loc[index, 'previous_month_year'] = str(row['month'] - 1) + '-' + str(row['year'])
df_new['category'] = df['category']
df_new['size'] = df['size']
df_new['sold_current'] = df['sold']
for index, row in df.iterrows():
sold_previous = df[(df['month'].astype(str) + '-' + df['year'].astype(str)) == df_new.loc[index, 'previous_month_year']]
sold_previous = sold_previous[sold_previous['category'] == df_new.loc[index, 'category']]
sold_previous = sold_previous[sold_previous['size'] == df_new.loc[index, 'size']]
sold_previous = sold_previous['sold'].values
if sold_previous.size > 0:
df_new.loc[index, 'sold_previous'] = sold_previous[0]
else:
df_new.loc[index, 'sold_previous'] = 0
This would be the output:这将是 output:
current_month_year previous_month_year category size sold_current sold_previous
0 6-2022 5-2022 shirt M 52 45
1 5-2022 4-2022 shirt M 45 0
2 1-2022 12-2021 shirt S 61 89
3 12-2021 11-2021 shirt S 89 0
4 12-2021 11-2021 pant S 72 0
5 7-2022 6-2022 shirt M 42 52
6 8-2022 7-2022 shirt M 55 42
7 8-2022 7-2022 pants 41 9 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.