Pandas Dataframe Python | 如何将一个单元格与复制的 dataframe 的另一个单元格进行比较？

Question

I have 2 same dataframes with different names (df_1 and df_2).我有 2 个不同名称的相同数据框（df_1 和 df_2）。

Lets say the dataframes have 2 columns Category and Time.假设数据框有 2 列类别和时间。 For eg.例如。

Category类别	Time时间
A一个	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
A一个	2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
A一个	2020-02-02 07:07:07.0000 2020-02-02 07:07:07.0000
B乙	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
B乙	2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
C C	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000
C C	2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000

I want the following if conditions: if category of df_1 matches with category of df_2 then, in a new dataframe(with columns: category, starttime, endtime), In case of A category, I want to put the first datetime(2020-02-02 05:05:05.0000) in starttime and last datetime (2020-02-02 07:07:07.0000) in endtime column.我想要以下 if 条件：如果 df_1 的类别与 df_2 的类别匹配，那么，在一个新的数据帧中（列：类别、开始时间、结束时间），如果是 A 类别，我想放置第一个日期时间（2020-02 -02 05:05:05.0000) 在结束时间列中的开始时间和最后日期时间 (2020-02-02 07:07:07.0000)。

Final Result new dataframe:最终结果新 dataframe：

Category类别	Start Time开始时间	EndTime时间结束
A一个	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000	2020-02-02 07:07:07.0000 2020-02-02 07:07:07.0000
B乙	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000	2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000
C C	2020-02-02 05:05:05.0000 2020-02-02 05:05:05.0000	2020-02-02 06:06:06.0000 2020-02-02 06:06:06.0000

How can I achieve this?我怎样才能做到这一点？ Please help.请帮忙。

Answer 1

Solution for the original answer原始答案的解决方案

pd.concat([df_1.groupby("CATEGORY").agg([min, max]),
           df_2.groupby("CATEGORY").agg([min, max])], 
        join="inner", axis=1).apply([min, max], axis=1)
    .rename(columns={"min":"START TIME", "max":"END TIME"})

Explanation解释

First, you group each DataFrame by CATEGORY to keep the min and max of each of its value.首先，您按类别对每个 DataFrame 进行分组，以保持其每个值的最小值和最大值。 This will also set the index to CATEGORY.这也会将索引设置为 CATEGORY。
```
 grouped_1 = df_1.groupby("CATEGORY").agg([min, max]) grouped_2 = df_2.groupby("CATEGORY").agg([min, max])
```
Then, you do an inner join to keep only the CATEGORies that are in both df_1 and df_2.然后，您执行内部连接以仅保留 df_1 和 df_2 中的 CATEGOries。 By default, the inner join is done on the index, which is what we want here (column CATEGORY in our original DataFrames).默认情况下，内部连接是在索引上完成的，这就是我们在这里想要的（我们原始 DataFrame 中的列 CATEGORY）。 You concatenate horizontally, getting 4 columns: two min and two max values per row.您水平连接，得到 4 列：每行两个最小值和两个最大值。
```
 grouped_both = pd.concat([grouped_1, grouped_2], join="inner", axis=1)
```
You keep the min and max values of each row, and rename the columns.您保留每行的最小值和最大值，并重命名列。
```
 final_df = grouped_both.apply([min, max], axis=1).rename(columns={"min":"START TIME", "max":"END TIME"})
```

NOTE: I assumed you wanted to merge the first and last timestamps of both DataFrames.注意：我假设您想合并两个 DataFrame 的第一个和最后一个时间戳。 If you truly wanted the start from df_1 and end from df_2, it would be a slightly different solution.如果您真的想要从 df_1 开始并从 df_2 结束，那将是一个稍微不同的解决方案。

Solution for one DataFrame and adding duration 1个DataFrame的解决方案并增加持续时间

If I understood correctly, then you don't need to copy the original DataFrame.如果我理解正确，那么你不需要复制原来的DataFrame。

# Group the DataFrame by CATEGORY and keep the min and max values
# We also need to get rid of the newly created MultiIndex level "TIME"
joined_df = df_1.groupby("CATEGORY").agg([min, max])["TIME"]
# Keep only rows where the min is different than the max
joined_df = joined_df[joined_df["min"]!= joined_df["max"]]
# Calculate the time deltas between min and max
# then cast it to a number value of the minutes
joined_df["DURATION"] = (joined_df[ "max"]- joined_df["min"]).astype('timedelta64[m]')
# We rename the columns min and max
joined_df = joined_df.rename(columns={"min":"START TIME", "max":"END TIME"})

Pandas Dataframe Python | 如何将一个单元格与复制的 dataframe 的另一个单元格进行比较？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-08 20:28:10

Solution for the original answer原始答案的解决方案

Explanation解释

Solution for one DataFrame and adding duration 1个DataFrame的解决方案并增加持续时间

Pandas Dataframe Python | 如何将一个单元格与复制的 dataframe 的另一个单元格进行比较？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-08 20:28:10

Solution for the original answer原始答案的解决方案

Explanation解释

Solution for one DataFrame and adding duration 1个DataFrame的解决方案并增加持续时间

解决方案1
1 已采纳 2020-12-08 20:28:10