简体   繁体   English

计算每个唯一 ID 的两个单独数据框中的列值的增加/减少百分比

[英]Calculate % increase/decrease for a column value in two separate data frames per each unique ID

I have 12 files:我有 12 个文件:

files = ['01_2021.csv', '02_2021.csv', '03_2021.csv', '04_2021.csv', '05_2021.csv', '06_2021.csv', '07_2021.csv', '08_2021.csv', '09_2021.csv', '10_2021.csv', '11_2020.csv', '12_2020.csv']

My CSV file structure:我的 CSV 文件结构:

id    itemName    NonImportantEntries    Entries    SomeOtherEntries
1      item1              27              111             163
2      item2              16               22              98
...
5000

I'm trying to calculate % decrease/increase of latest month's file (in this case 10_2021) value in "Entries" to the previous month's value in "Entries" per each unique id.我正在尝试根据每个唯一 ID 计算“条目”中最近一个月的文件(在本例中为 10_2021)值与“条目”中上个月值的减少/增加百分比。 Also, consider that it's not guaranteed that unique ID will always be present in both files.另外,请考虑不能保证唯一 ID 将始终存在于两个文件中。

10_2021.csv: 10_2021.csv:

id    itemName    NonImportantEntries    Entries    SomeOtherEntries
1      item1              27              111             163
2      item2              16               22              98
...
5000

09_2021.csv: 09_2021.csv:

id    itemName    NonImportantEntries    Entries    SomeOtherEntries
1      item1              27               97             163
2      item2              16               57              98
...
5000

for example with id=1:例如 id=1:

111(10_2021.csv) - 97 (09_2021.csv) = 14
14 / 97 (09_2021.csv) = 0.1443 * 100 = 14.43

for example with id=2:例如 id=2:

22(10_2021.csv) - 57 (09_2021.csv) = -35
-35 / 57 (09_2021.csv) = -0.6140 * 100 = -61.40

Desired output is:期望的输出是:

id    %differenceLatestMonthToPreviousMonth
1                    14.43%
2                   -61.40%

My Code so far:到目前为止我的代码:

import pandas as pd

from os import listdir
from os.path import isfile, join

#readMyDirectoryForFiles
mypath= <myDirectoryPath>
   
list_of_files = [f for f in listdir(mypath) if isfile(join(mypath, f))]

#GenerateAllFilesInDirList
list_of_files = [mypath+x for x in list_of_files]

#sortListToEnsureLatestTwoMonthsAreOnTop
list_of_files.sort()

#ConsiderOnlyLatestTwoMonths
filesNeeded = list_of_files[:2]

#I'm stuck here: map file names to each unique ID and calculate like on examples provided above for id 1 and id 2.

dataframes = [pd.read_csv(fi) for fi in filesNeeded]

Could someone help with this?有人可以帮忙吗? thank you in advance.先感谢您。

It sounds like you need to pd.DataFrame.join your two DataFrame s on your id column, and then calculate the % difference:听起来您需要pd.DataFrame.join您的id列上的两个DataFrame ,然后计算差异百分比:

this_month = dataframes[0]
last_month = dataframes[1]

combined = this_month.join(last_month.set_index('id'),
                           on='id',
                           lsuffix='_this_month',
                           rsuffix='_last_month',
                          )

combined['pct_diff_between_months'] = \
    ((combined['Entries_this_month']/combined['Entries_last_month'] - 1)*100)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建一个函数以查看每组唯一ID的列值的增加或减少 - Creating a function to see increase or decrease in column value per group of unique ids 计算每个*唯一*列值的出现次数 - Calculate the number of occurrences of a column value value *per* unique id 将每个值乘以两个数据帧,然后将每个答案添加到列中 - Multiply each value in two data frames and add each answer to a column 按 python 中的每一列比较两个数据帧? - Compare two data frames by each column in python? 将两个熊猫数据帧与ID值配对 - Pairing two Pandas data frames with an ID value 是否有熊猫函数来转置数据框以为现有列的每个唯一值创建单独的列? - Is there a pandas function to transpose a data frame to create a separate column for each unique value of an existing column? 根据熊猫数据框中同一行的上一列值计算增加或减少的百分比 - Calculate the percentage increase or decrease based on the previous column value of the same row in pandas dataframe 计算列值每次出现的百分比并按 ID 分组 - Calculate the % of each occurrence of a column value and group by an ID 如何为每个唯一的子文件夹集中两个制表符分隔的数据框 - How to concencate two tab separated data frames for each unique subfolder 使用 pandas 将 dataframe 列拆分为两个单独的数据帧 - Splitting a dataframe column into two separate data frames using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM