简体   繁体   English

Pandas Dataframe Python | 如何将一个单元格与复制的 dataframe 的另一个单元格进行比较?

[英]Pandas Dataframe Python | How to compare a cell with another cell of a copied dataframe?

I have 2 same dataframes with different names (df_1 and df_2).我有 2 个不同名称的相同数据框(df_1 和 df_2)。

Lets say the dataframes have 2 columns Category and Time.假设数据框有 2 列类别和时间。 For eg.例如。

Category类别 Time时间
A一个 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
A一个 2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
A一个 2020-02-02 07:07:07.0000 2020-02-02 07:07:07.0000
B 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
B 2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
C C 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
C C 2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000

I want the following if conditions: if category of df_1 matches with category of df_2 then, in a new dataframe(with columns: category, starttime, endtime), In case of A category, I want to put the first datetime(2020-02-02 05:05:05.0000) in starttime and last datetime (2020-02-02 07:07:07.0000) in endtime column.我想要以下 if 条件:如果 df_1 的类别与 df_2 的类别匹配,那么,在一个新的数据帧中(列:类别、开始时间、结束时间),如果是 A 类别,我想放置第一个日期时间(2020-02 -02 05:05:05.0000) 在结束时间列中的开始时间和最后日期时间 (2020-02-02 07:07:07.0000)。

Final Result new dataframe:最终结果新 dataframe:

Category类别 Start Time开始时间 EndTime时间结束
A一个 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000 2020-02-02 07:07:07.0000 2020-02-02 07:07:07.0000
B 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000 2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
C C 2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000 2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000

How can I achieve this?我怎样才能做到这一点? Please help.请帮忙。

Solution for the original answer原始答案的解决方案

pd.concat([df_1.groupby("CATEGORY").agg([min, max]),
           df_2.groupby("CATEGORY").agg([min, max])], 
        join="inner", axis=1).apply([min, max], axis=1)
    .rename(columns={"min":"START TIME", "max":"END TIME"})

Explanation解释

  1. First, you group each DataFrame by CATEGORY to keep the min and max of each of its value.首先,您按类别对每个 DataFrame 进行分组,以保持其每个值的最小值和最大值。 This will also set the index to CATEGORY.这也会将索引设置为 CATEGORY。

     grouped_1 = df_1.groupby("CATEGORY").agg([min, max]) grouped_2 = df_2.groupby("CATEGORY").agg([min, max])
  2. Then, you do an inner join to keep only the CATEGORies that are in both df_1 and df_2.然后,您执行内部连接以仅保留 df_1 和 df_2 中的 CATEGOries。 By default, the inner join is done on the index, which is what we want here (column CATEGORY in our original DataFrames).默认情况下,内部连接是在索引上完成的,这就是我们在这里想要的(我们原始 DataFrame 中的列 CATEGORY)。 You concatenate horizontally, getting 4 columns: two min and two max values per row.您水平连接,得到 4 列:每行两个最小值和两个最大值。

     grouped_both = pd.concat([grouped_1, grouped_2], join="inner", axis=1)
  3. You keep the min and max values of each row, and rename the columns.您保留每行的最小值和最大值,并重命名列。

     final_df = grouped_both.apply([min, max], axis=1).rename(columns={"min":"START TIME", "max":"END TIME"})

NOTE: I assumed you wanted to merge the first and last timestamps of both DataFrames.注意:我假设您想合并两个 DataFrame 的第一个和最后一个时间戳。 If you truly wanted the start from df_1 and end from df_2, it would be a slightly different solution.如果您真的想要从 df_1 开始并从 df_2 结束,那将是一个稍微不同的解决方案。

Solution for one DataFrame and adding duration 1个DataFrame的解决方案并增加持续时间

If I understood correctly, then you don't need to copy the original DataFrame.如果我理解正确,那么你不需要复制原来的DataFrame。

# Group the DataFrame by CATEGORY and keep the min and max values
# We also need to get rid of the newly created MultiIndex level "TIME"
joined_df = df_1.groupby("CATEGORY").agg([min, max])["TIME"]
# Keep only rows where the min is different than the max
joined_df = joined_df[joined_df["min"]!= joined_df["max"]]
# Calculate the time deltas between min and max
# then cast it to a number value of the minutes
joined_df["DURATION"] = (joined_df[ "max"]- joined_df["min"]).astype('timedelta64[m]')
# We rename the columns min and max
joined_df = joined_df.rename(columns={"min":"START TIME", "max":"END TIME"})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame 如何将列的每个单元格与另一个列中的每个单元格进行比较 dataframe 并删除匹配的列 - Pandas DataFrame how to compare each cell of a colum with each cell of another column in another dataframe and drop matching ones 如何从python pandas数据框中删除单元格 - How to delete the cell from python pandas dataframe Pandas:如何在另一个数据帧的单元格内添加数据帧? - Pandas: how to add a dataframe inside a cell of another dataframe? 如何将数据框与python中熊猫中另一个数据框的子集相交进行比较? - How to compare a dataframe to a subset intersection of another dataframe in pandas in python? python pandas DataFrame - 逐个单元比较两个具有相同索引和标记的数据帧 - python pandas DataFrame - compare two identically indexed and labeled dataframes cell by cell Pandas 单元格值是另一个数据框中的列名 - Pandas cell value is a column name in another dataframe 根据 pandas 中的另一个填充 dataframe 中的单元格 - Fill cell within a dataframe according to another in pandas 将 dataframe 中的单元格值与另一个 dataframe 中的单元格值进行比较以分配图像 - Compare cell value in dataframe to another dataframe for assigning an image 访问 Pandas DataFrame 单元格 - accessing a pandas DataFrame cell 使用pandas附加到数据帧单元? - Appending to a dataframe cell with pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM