简体   繁体   English

Python:检查两个数据框是否在同一位置包含填充单元格

[英]Python: check if two dataframes contain filled cells in the same location

So basically what I would like to do is to make sure that cell (x, y) from either DF1 or DF2 is filled but not in both, for all cells in these dataframes.所以基本上我想做的是确保 DF1 或 DF2 的单元格(x, y)被填充,但不是同时填充,对于这些数据帧中的所有单元格。 DF1 and DF2 are of equal shape so there is an equal amount of cells. DF1 和 DF2 的形状相同,因此细胞数量相同。 If both cells in the same location in DF1 and DF2 are filled then it should raise an exception that something goes wrong.如果 DF1 和 DF2 中同一位置的两个单元格都已填充,则应引发异常,即出现问题。

For some reason, I can't seem to be able to wrap my head around it, although it sounds quite easy.出于某种原因,我似乎无法理解它,尽管这听起来很容易。

What I've tried:我试过的:

  • Check with .notnull() and then compare both of them > results in a big boolean mess that is not distinguishable.检查.notnull()然后比较它们 > 导致无法区分的大布尔混乱。
  • Could do it with a double for loop but that just does not seem pythonic enough.可以用一个双循环来做到这一点,但这似乎还不够pythonic。

See below examples of DF1/DF2.请参见下面的 DF1/DF2 示例。 The indices and columns are identical, only different parts are filled, the empty cells are filled with np.nan .索引和列是相同的,只有不同的部分被填充,空单元格用np.nan填充。 The cell values contain the number of orders on a certain day for a certain delivery day.单元格值包含特定交货日特定日期的订单数量。 The goal is to condense this to a matrix containing the x-week average from a certain order day (mon-sun) for a certain delivery day (mon - sun).目标是将其浓缩为一个矩阵,该矩阵包含某个订单日(周一-周日)到某个交货日(周一-周日)的 x 周平均值。

DF1

DF2


EDIT: text files and expected output编辑:文本文件和预期输出

DF1.csv DF1.csv

order_day,2022-06-18,2022-06-19,2022-06-20,2022-06-21,2022-06-22,2022-06-23,2022-06-24,2022-06-25,2022-06-26,2022-06-27,2022-06-28,2022-06-29,2022-06-30,2022-07-01,2022-07-02,2022-07-03,2022-07-04,2022-07-05,2022-07-06,2022-07-07,2022-07-08
Friday,34.0,,214.0,74.0,46.0,21.0,19.0,,,,,,,,,,,,,,
Saturday,,,79.0,154.0,75.0,28.0,16.0,14.0,,,,,,,,,,,,,
Sunday,,,,301.0,183.0,60.0,42.0,25.0,,,,,,,,,,,,,
Monday,,,,49.0,61.0,216.0,104.0,36.0,,28.0,,,,,,,,,,,
Tuesday,,,,,47.0,180.0,77.0,36.0,,17.0,8.0,,,,,,,,,,
Wednesday,,,,,,84.0,200.0,69.0,,58.0,24.0,10.0,,,,,,,,,
Thursday,,,,,,,84.0,148.0,,87.0,37.0,10.0,3.0,,,,,,,,

DF2.csv DF2.csv

order_day,2022-06-18,2022-06-19,2022-06-20,2022-06-21,2022-06-22,2022-06-23,2022-06-24,2022-06-25,2022-06-26,2022-06-27,2022-06-28,2022-06-29,2022-06-30,2022-07-01,2022-07-02,2022-07-03,2022-07-04,2022-07-05,2022-07-06,2022-07-07,2022-07-08
Friday,,,,,,,,44.0,,290.0,86.0,54.0,13.0,16.0,,,,,,,
Saturday,,,,,,,,,,135.0,177.0,125.0,24.0,28.0,8.0,,,,,,
Sunday,,,,,,,,,,,358.0,181.0,58.0,48.0,29.0,,,,,,
Monday,,,,,,,,,,,101.0,156.0,96.0,60.0,32.0,,15.0,,,,
Tuesday,,,,,,,,,,,,3.0,38.0,20.0,6.0,,4.0,2.0,,,
Wednesday,,,,,,,,,,,,,,,,,,,,,
Thursday,,,,,,,,,,,,,,,,,,,,,

Load with pd.read_csv('DF2.csv', index_col='order_day')使用pd.read_csv('DF2.csv', index_col='order_day')加载

Expected output预期产出

There is not really an exact expected output.实际上并没有确切的预期输出。 It could be something like print('No filled cells overlap!') .它可能类似于print('No filled cells overlap!') For this MRE you can be fairly sure that there is no overlap.对于这个 MRE,您可以相当确定没有重叠。 However, I am going to work with larger date ranges and I don't want to rely on good faith.但是,我将使用更大的日期范围,我不想依赖善意。

Update更新

A most useful output to analyze:要分析的最有用的输出:

dups = (pd.concat([df1.set_index('order_day').stack(),
                   df2.set_index('order_day').stack()],
                   keys=['df1', 'df2'], axis=1)
          .loc[lambda x: x.notna().all(axis=1)])
print(dups)

# Output:
                      df1  df2
order_day                     
Fri       2022-06-20  1.0  2.0
Sat       2022-06-18  1.0  3.0
          2022-06-20  3.0  2.0
Tue       2022-06-20  3.0  1.0
Thu       2022-06-19  1.0  3.0

Setup a MRE:设置 MRE:

import pandas as pd
import numpy as np

wdays = ['Fri', 'Sat', 'Sun', 'Mon', 'Tue', 'Wed', 'Thu']
dates = ['2022-06-18', '2022-06-19', '2022-06-20']
np.random.seed(2022)
data1 = np.random.choice([1, 2, 3, np.nan], (7, 3), p=[.2, .1, .2, .5])
np.random.seed(2021)
data2 = np.random.choice([1, 2, 3, np.nan], (7, 3), p=[.1, .2, .2, .5])
df1 = pd.DataFrame(data1, wdays, dates).rename_axis('order_day').reset_index()
df2 = pd.DataFrame(data2, wdays, dates).rename_axis('order_day').reset_index()
print(df1)
print(df2)

# df1
  order_day  2022-06-18  2022-06-19  2022-06-20
0       Fri         1.0         3.0         1.0
1       Sat         1.0         NaN         3.0
2       Sun         NaN         NaN         NaN
3       Mon         NaN         NaN         NaN
4       Tue         NaN         NaN         3.0
5       Wed         3.0         3.0         NaN
6       Thu         NaN         1.0         NaN

# df2
  order_day  2022-06-18  2022-06-19  2022-06-20
0       Fri         NaN         NaN         2.0
1       Sat         3.0         NaN         2.0
2       Sun         2.0         NaN         NaN
3       Mon         NaN         1.0         1.0
4       Tue         NaN         NaN         1.0
5       Wed         NaN         NaN         NaN
6       Thu         NaN         3.0         3.0

Old answer旧答案

Flat your 2 dataframes ( stack drops NaN values by default) then concatenate them and check duplicate index:扁平化你的 2 个数据帧( stack默认丢弃 NaN 值),然后将它们连接起来并检查重复索引:

>>> dups = (pd.concat([df1.set_index('order_day').stack(),
                   df2.set_index('order_day').stack()])
              .loc[lambda x: x.index.duplicated(keep=False)])

Series([], dtype: float64)

This does what I want, but it seems to me that this could be done easier/more pythonic.这是我想要的,但在我看来,这可以更容易/更pythonic。

for col in df1.columns:
    for idx in df1.index:
        if pd.notna(df1.loc[idx, col]) and pd.notna(df2.loc[idx, col]):
            raise Exception(f"Cells ({idx = }, {col = }) both contain values.")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 检查两个字符串是否在python中包含相同的模式 - Check if two strings contain the same pattern in python 有没有一种方法可以检查两个列表是否在Python中包含相同的值? - Is there a way to check if two lists contain any of the same values in Python? 检查两个字符串是否在Python中包含相同的单词集 - Check if two strings contain the same set of words in Python 有没有办法检查两个 object 在 python 中的每个变量中是否包含相同的值? - Is there a way to check if two object contain the same values in each of their variables in python? 在两个数据框中随机采样同一位置 - Randomly sample same location in two dataframes Python:检查两个数组(可能包含重复元素)是否包含相同的元素集 - Python: Check if two arrays (may contain repeated elements) contain the same set of elements 如何访问 Pandas 中两个数据帧上的每个相同的索引单元格? - How to acces each same indexed cells on two Dataframes in Pandas? Python:检查两个变量是否具有相同的内容? 两个变量都是未知的,可以是数据帧、列表列表等 - Python: Check if two variables have the same content? Both Variables are unknown, can be DataFrames, lists of lists, etc Python - 通过相同的数字连接两个数据帧 - Python - join two dataframes by the same number 比较同一日期范围内的两个数据帧 python - comparing two dataframes on the same date range python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM