![](/img/trans.png)
[英]In Python, how to compare two csv files based on values in one column and output records from first file that do not match second
[英]Python dataset - how to compare two excel files (one from previous day and other from today) to show the new records and deleted records?
我想比較兩個具有三列的 excel 文件。 一個文件來自前一天,另一個文件來自今天(兩者具有相同的列名)。 如何比較兩者以顯示前一天添加和刪除的內容? 注意:我只想關注第一列(堆棧名稱)例如,如果堆棧名稱 xyz 在 08102022.csv 中,而不是在 08112022.csv 中,那么我想要一個輸出它並顯示它是已刪除堆棧的表。 反之亦然,如果昨天的 csv 中不存在,而今天存在,那么我想要它到 output 並向我展示它是一個新堆棧。
newstack_additions = pd.concat([dfprevious,dftoday]).drop_duplicates(subset = ['Stack Name'], keep=False)
print(newstack_additions)
newstack_additions["Stack Change Type"] = "New Stack"
newstack_additions['Last Modified'] = pd.to_datetime(newstack_additions['Last Modified'], format= '%m/%d/%Y-%H:%M:%S')
上面的內容只顯示了已添加的行,但從邏輯上講,這只是在連接兩個數據幀后返回沒有重復的內容。 它沒有將任何內容視為“新”或“已刪除”。
這是 08102022.csv
Stack Name ... Last Modified
0 prod/account/cloudformation/AWSAccountBase.yaml ... 03/15/2022-02:16:48
1 prod/account/cloudformation/AWSAccountBaseAddi... ... 03/26/2022-02:57:56
2 prod/account/cloudformation/AWSAccountCloudCus... ... 03/04/2022-02:14:01
3 prod/account/cloudformation/AWSAccountCloudCus... ... 09/03/2020-02:11:29
4 prod/account/cloudformation/AWSAccountInfo.yaml ... 09/03/2020-02:11:29
... ... ...
3139 prod/v003/util/unix-engineering/SSMAutomationR... ... 05/16/2020-00:51:32
3140 prod/v003/util/unix-engineering/SSMPetsBSCReme... ... 05/16/2020-00:51:32
3141 prod/v003/util/unix-engineering/SSMSudoStateMa... ... 05/16/2020-00:51:32
3142 prod/v003/util/unix-engineering/SudoLambdaDepl... ... 05/16/2020-00:51:32
3143 prod/v003/util/unix-engineering/linux_bsc_reme... ... 05/16/2020-00:51:32
[3144 rows x 3 columns]
這是 08112022.csv
Stack Name ... Last Modified
0 prod/account/cloudformation/AWSAccountBase.yaml ... 03/15/2022-02:16:48
1 prod/account/cloudformation/AWSAccountBaseAddi... ... 03/26/2022-02:57:56
2 prod/account/cloudformation/AWSAccountCloudCus... ... 03/04/2022-02:14:01
3 prod/account/cloudformation/AWSAccountCloudCus... ... 09/03/2020-02:11:29
4 prod/account/cloudformation/AWSAccountInfo.yaml ... 09/03/2020-02:11:29
... ... ...
3140 prod/v003/util/unix-engineering/SSMAutomationR... ... 05/16/2020-00:51:32
3141 prod/v003/util/unix-engineering/SSMPetsBSCReme... ... 05/16/2020-00:51:32
3142 prod/v003/util/unix-engineering/SSMSudoStateMa... ... 05/16/2020-00:51:32
3143 prod/v003/util/unix-engineering/SudoLambdaDepl... ... 05/16/2020-00:51:32
3144 prod/v003/util/unix-engineering/linux_bsc_reme... ... 05/16/2020-00:51:32
[3145 rows x 3 columns]
我想要(顯示下面不在 08102022.csv 中,並且已添加到 08112022.csv 中):
Stack Name ... Stack Change Type
1700 prod/landing-zone/splunk/SplunkDataIngestion.yaml ... New Stack
[1 rows x 4 columns]
同樣,我想展示 08102022.csv 中的內容,而不再是 08112022.csv 中的內容。
嘗試這個:
df2[~df2.index.isin(df1.index)]
其中 df2 是最后一個
另一個:
x = pd.concat([df1, df2])
y = x.drop_duplicates(keep=False, inplace=False)
似乎您需要找到兩個 dfs 之間的區別。 可能的答案: Python Pandas - 找出兩個數據幀之間的差異
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.