[英]Calculate columns from multiple CSV files and save results into a new file
我是 Python 新手,並嘗試使用 python/pandas 執行以下操作。
我有四個看起來像這樣的 CSV 文件(唯一的區別是第一列中的日期值):
First_week.csv:
date id name total_unitCount total_orderCount total_invoiceCount
2020-02-12 1 Guitar 300 600 500
2020-02-12 2 Drums 500 600 500
2020-02-12 3 Piano 700 1000 400
Second_week.csv:
date id name total_unitCount total_orderCount total_invoiceCount
2020-02-05 1 Guitar 300 800 500
2020-02-05 2 Drums 500 300 500
2020-02-05 3 Piano 700 350 400
我需要計算每個 csv 文件中/每周之間的兩個數字之間的百分比差異(first_week.total_orderCount 與 second_week.total_orderCount,第二個與第三個,第三個與第四個):
計算示例: Difference = ((total_orderCount[where date is 2020-02-12] - total_orderCount[where date is 2020-12-05] ) / Units[where date is 2020-12-05]) * 100%
然后將每周的結果保存到一個新的 CSV 文件中(這里我只提供了 week1vsweek2 的結果):
id name %difference_week1vsweek2 %difference_week2vsweek3 %difference_week3vsweek4
1 Guitar -0.25
2 Drums 1
3 Piano 0.65
有人可以幫助我或給我一些分步說明嗎? 先感謝您!
關於如何從多個 CSV 文件進行列計算並將結果保存到一個新文件的偽代碼是在 python 中使用 Pandas
import pandas as pd
df1 = pd.read_csv('First.csv')
df2 = pd.read_csv('Second.csv')
output_df = pd.DataFrame(columns = ["col1", "col2"])
output_df['result'] = df1['col2'] -df2['col2'] # some column calculation
df3.to_format("output.format")
這是問題中給定示例的實際代碼
#import libraries
import pandas as pd
#read files
df1 = pd.read_csv('First_week.csv')
df2 = pd.read_csv('Second_week.csv')
#Create new file and save results
column_names = ["id", "name"]
df3 = pd.DataFrame(columns = column_names)
df3[['id', 'name']] = df1[['id', 'name']]
df3['%difference_week1vsweek2'] = (df1['total_orderCount']-df2['total_orderCount'])/df2['total_orderCount']*100
print(df3)
df3.to_csv("output.csv")
希望能幫助到你。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.