[英]Calculate % increase/decrease for a column value in two separate data frames per each unique ID
I have 12 files:我有 12 个文件:
files = ['01_2021.csv', '02_2021.csv', '03_2021.csv', '04_2021.csv', '05_2021.csv', '06_2021.csv', '07_2021.csv', '08_2021.csv', '09_2021.csv', '10_2021.csv', '11_2020.csv', '12_2020.csv']
My CSV file structure:我的 CSV 文件结构:
id itemName NonImportantEntries Entries SomeOtherEntries
1 item1 27 111 163
2 item2 16 22 98
...
5000
I'm trying to calculate % decrease/increase of latest month's file (in this case 10_2021) value in "Entries" to the previous month's value in "Entries" per each unique id.我正在尝试根据每个唯一 ID 计算“条目”中最近一个月的文件(在本例中为 10_2021)值与“条目”中上个月值的减少/增加百分比。 Also, consider that it's not guaranteed that unique ID will always be present in both files.
另外,请考虑不能保证唯一 ID 将始终存在于两个文件中。
10_2021.csv: 10_2021.csv:
id itemName NonImportantEntries Entries SomeOtherEntries
1 item1 27 111 163
2 item2 16 22 98
...
5000
09_2021.csv: 09_2021.csv:
id itemName NonImportantEntries Entries SomeOtherEntries
1 item1 27 97 163
2 item2 16 57 98
...
5000
for example with id=1:例如 id=1:
111(10_2021.csv) - 97 (09_2021.csv) = 14
14 / 97 (09_2021.csv) = 0.1443 * 100 = 14.43
for example with id=2:例如 id=2:
22(10_2021.csv) - 57 (09_2021.csv) = -35
-35 / 57 (09_2021.csv) = -0.6140 * 100 = -61.40
Desired output is:期望的输出是:
id %differenceLatestMonthToPreviousMonth
1 14.43%
2 -61.40%
My Code so far:到目前为止我的代码:
import pandas as pd
from os import listdir
from os.path import isfile, join
#readMyDirectoryForFiles
mypath= <myDirectoryPath>
list_of_files = [f for f in listdir(mypath) if isfile(join(mypath, f))]
#GenerateAllFilesInDirList
list_of_files = [mypath+x for x in list_of_files]
#sortListToEnsureLatestTwoMonthsAreOnTop
list_of_files.sort()
#ConsiderOnlyLatestTwoMonths
filesNeeded = list_of_files[:2]
#I'm stuck here: map file names to each unique ID and calculate like on examples provided above for id 1 and id 2.
dataframes = [pd.read_csv(fi) for fi in filesNeeded]
Could someone help with this?有人可以帮忙吗? thank you in advance.
先感谢您。
It sounds like you need to pd.DataFrame.join
your two DataFrame
s on your id
column, and then calculate the % difference:听起来您需要
pd.DataFrame.join
您的id
列上的两个DataFrame
,然后计算差异百分比:
this_month = dataframes[0]
last_month = dataframes[1]
combined = this_month.join(last_month.set_index('id'),
on='id',
lsuffix='_this_month',
rsuffix='_last_month',
)
combined['pct_diff_between_months'] = \
((combined['Entries_this_month']/combined['Entries_last_month'] - 1)*100)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.