简体   繁体   English

根据来自同一 pandas dataframe 的不同列中的值比较同一列的值

[英]Comparing values of same column depending on values in different column from the same pandas dataframe

Here is my data frame:这是我的数据框:

month year category类别 size尺寸 sold
6 6 2022 2022 shirt衬衫 M 52 52
5 5 2022 2022 shirt衬衫 M 45 45
1 1 2022 2022 shirt衬衫 S小号 61 61
12 12 2021 2021 shirt衬衫 S小号 89 89
12 12 2021 2021 pant喘气 S小号 72 72
7 7 2022 2022 shirt衬衫 M 42 42
8 8 2022 2022 shirt衬衫 M 55 55
8 8 2022 2022 pants裤子 41 41 9 9

What I would like is to roll up previous month to another column:我想要的是将上个月汇总到另一列:

Like this:像这样:

current_month_year current_month_year previous_month_year previous_month_year category类别 size尺寸 sold_current已售出当前 sold_previous已售出_previous
6-2022 6-2022 5-2022 5-2022 shirt衬衫 M 52 52 45 45
1-2022 1-2022 12-2021 12-2021 shirt衬衫 S小号 61 61 89 89
12-2021 12-2021 pant喘气 S小号 72 72 0 0
8-2022 8-2022 7-2022 7-2022 shirt衬衫 M 55 55 42 42
8-2022 8-2022 pant喘气 41 41 9 9 0 0

how would I do this?我该怎么做?

I have no idea how to do this so don't have any code to show.我不知道如何做到这一点,所以没有任何代码可以显示。

Will start by creating the column current_month_year based on the columns month and year将首先根据列monthyear创建列current_month_year

df['current_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year
0      6  2022    shirt    M    52             6-2022
1      5  2022    shirt    M    45             5-2022
2      1  2022    shirt    S    61             1-2022
3     12  2021    shirt    S    89            12-2021
4     12  2021     pant    S    72            12-2021

Then, let's now focus on the column previous_month_year .然后,现在让我们关注previous_month_year列。 The approach will be similar at first, but then we will add the conditions方法一开始是类似的,但随后我们将添加条件

df['previous_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              6-2022
1      5  2022    shirt    M    45             5-2022              5-2022
2      1  2022    shirt    S    61             1-2022              1-2022
3     12  2021    shirt    S    89            12-2021             12-2021
4     12  2021     pant    S    72            12-2021             12-2021

Then let's add the conditions:然后让我们添加条件:

• If month is 1 (January), the previous month is 12 (December) with less 1 year • 如果month1 (1 月),则上个月为12 (12 月),减去1 year

df.loc[df['month'] == 1, 'previous_month_year'] = '12-' + (df['year'] - 1).astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              6-2022
1      5  2022    shirt    M    45             5-2022              5-2022
2      1  2022    shirt    S    61             1-2022             12-2021
3     12  2021    shirt    S    89            12-2021             12-2021
4     12  2021     pant    S    72            12-2021             12-2021

• If month is different than 1 (January), then subtract 1 to the month and the year • 如果month不是1 (一月),则将monthyear减去1

df.loc[df['month'] != 1, 'previous_month_year'] = (df['month'] - 1).astype(str) + '-' + df['year'].astype(str)

[Out]:
   month  year category size  sold current_month_year previous_month_year
0      6  2022    shirt    M    52             6-2022              5-2022
1      5  2022    shirt    M    45             5-2022              4-2022
2      1  2022    shirt    S    61             1-2022             12-2021
3     12  2021    shirt    S    89            12-2021             11-2021
4     12  2021     pant    S    72            12-2021             11-2021
5      7  2022    shirt    M    42             7-2022              6-2022
6      8  2022    shirt    M    55             8-2022              7-2022
7      8  2022    pants   41     9             8-2022              7-2022

Finally, as OP wants to have the column sold_previous to show the number of sold items in the previous_month_year for a given current_month_year , the following will do the work (if there isn't previous current_month_year , one will use the value 0):最后,由于 OP 希望列sold_previous来显示给定current_month_yearprevious_month_yearsold的商品数量,因此以下将完成工作(如果没有先前的current_month_year ,将使用值 0):

df['sold_previous'] = 0
for i in df.index:
    if df.loc[i, 'current_month_year'] in df['previous_month_year'].values:
        df.loc[i, 'sold_previous'] = df.loc[df['previous_month_year'] == df.loc[i, 'current_month_year'], 'sold'].values[0]
    else:
        df.loc[i, 'sold_previous'] = 0

[Out]:
   month  year category  ... current_month_year  previous_month_year sold_previous
0      6  2022    shirt  ...             6-2022               5-2022            42
1      5  2022    shirt  ...             5-2022               4-2022            52
2      1  2022    shirt  ...             1-2022              12-2021             0
3     12  2021    shirt  ...            12-2021              11-2021            61
4     12  2021     pant  ...            12-2021              11-2021            61
5      7  2022    shirt  ...             7-2022               6-2022            55
6      8  2022    shirt  ...             8-2022               7-2022             0
7      8  2022    pants  ...             8-2022               7-2022             0

If one wants to change columns names, for example sold to sold_current , one can do the following如果想更改列名,例如soldsold_current ,可以执行以下操作

df['sold_current'] = df['sold'] 

You can create a new DataFrame with the output columns you want and iterate over the original DataFrame to filter the size and category in adjacent months to get the previous sales.您可以使用所需的 output 列创建新的 DataFrame 并迭代原始 DataFrame 以过滤相邻月份的大小和类别以获得先前的销售额。 The current sales and the rest of the rows are easy to add by simply copying from one DataFrame to the other and doing a small transformation to calculate the previous month.当前销售额和行的 rest 很容易添加,只需从一个 DataFrame 复制到另一个并进行小转换以计算上个月。

I would do something like this:我会做这样的事情:

df = pd.DataFrame({'month': [6, 5, 1, 12, 12, 7, 8, 8], 'year': [2022, 2022, 2022, 2021, 2021, 2022, 2022, 2022], 'category': ['shirt', 'shirt', 'shirt', 'shirt', 'pant', 'shirt', 'shirt', 'pants'], 'size': ['M', 'M', 'S', 'S', 'S', 'M', 'M', '41'], 'sold': [52, 45, 61, 89, 72, 42, 55, 9]})

df_new = pd.DataFrame(columns=['current_month_year', 'previous_month_year', 'category', 'size', 'sold_current', 'sold_previous'])

df_new['current_month_year'] = df['month'].astype(str) + '-' + df['year'].astype(str)
for index, row in df.iterrows():
    if row['month'] == 1:
        df_new.loc[index, 'previous_month_year'] = '12-' + str(row['year'] - 1)
    else:
        df_new.loc[index, 'previous_month_year'] = str(row['month'] - 1) + '-' + str(row['year'])

df_new['category'] = df['category']
df_new['size'] = df['size']
df_new['sold_current'] = df['sold']

for index, row in df.iterrows():
    sold_previous = df[(df['month'].astype(str) + '-' + df['year'].astype(str)) == df_new.loc[index, 'previous_month_year']]
    sold_previous = sold_previous[sold_previous['category'] == df_new.loc[index, 'category']]
    sold_previous = sold_previous[sold_previous['size'] == df_new.loc[index, 'size']]
    sold_previous = sold_previous['sold'].values
    if sold_previous.size > 0:
        df_new.loc[index, 'sold_previous'] = sold_previous[0]
    else:
        df_new.loc[index, 'sold_previous'] = 0

This would be the output:这将是 output:

current_month_year  previous_month_year category    size    sold_current    sold_previous
0   6-2022  5-2022  shirt   M   52  45
1   5-2022  4-2022  shirt   M   45  0
2   1-2022  12-2021 shirt   S   61  89
3   12-2021 11-2021 shirt   S   89  0
4   12-2021 11-2021 pant    S   72  0
5   7-2022  6-2022  shirt   M   42  52
6   8-2022  7-2022  shirt   M   55  42
7   8-2022  7-2022  pants   41  9   0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较 2 个 pandas 数据框列并根据值是否相同创建新列 - Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not 比较同一数据框列中的值 - Comparing values within the same dataframe column 比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值 - comparing values of 2 columns from same pandas dataframe & returning value of 3rd column based on comparison 从 Pandas DataFrame 中选择一列中具有相同值但另一列中具有不同值的行 - Select rows from a Pandas DataFrame with same values in one column but different value in the other column 用 pandas dataframe 中同一列上的不同值替换缺失值 - replacing the missing value with different values on the same column in pandas dataframe 比较Pandas数据框中的列值 - Comparing Column Values in a Pandas Dataframe 在Pandas数据框中查找具有相同列值的行 - Finding rows with same column values in pandas dataframe 如何切换同一Pandas DataFrame中的列值 - How to switch column values in the same Pandas DataFrame 从Pandas Dataframe中找到列中的唯一值,然后查看这些值在另一列中是否具有相同的值 - From Pandas Dataframe find unique values in column and see if those values have the same values in another column select 行来自 pandas dataframe 在另一列不同的列中具有相同值并找到平均值并使其成为字典 - select rows from pandas dataframe with same values in one column different on the other &find the average&make it a dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM