简体   繁体   English

如何使用 pandas 根据日期时间列查找每个 id 的第一次出现?

[英]How to find first occurrence for each id based on datetime column with pandas?

I have seen a lot similar questions, but didn't quite find an answer to my problem.我见过很多类似的问题,但并没有完全找到我的问题的答案。 Let's say I have a df:假设我有一个df:

    sample_id     tested_at   test_value
            1    2020-07-21            5
            1    2020-07-22            4
            1    2020-07-23            6
            2    2020-07-26            6
            2    2020-07-28            5
            3    2020-07-22            4
            3    2020-07-27            4
            3    2020-07-30            6 

The df is already sorted for ascending by tested_at column. df 已经按tested_at列升序排序。 I now need to add another column first_test which would indicate the first test value for each sample_id in every line, regardless if it is highest or not.我现在需要添加另一列first_test来指示每一行中每个sample_id的第一个测试值,无论它是否最高。 The output should be: output 应该是:

    sample_id     tested_at   test_value   first_test
            1    2020-07-21            5            5
            1    2020-07-22            4            5
            1    2020-07-23            6            5
            2    2020-07-26            6            6
            2    2020-07-28            5            6
            3    2020-07-22            4            4
            3    2020-07-27            4            4
            3    2020-07-30            6            4

The df is also quite big, so a faster way would be very appreaciated. df 也很大,所以更快的方法会非常受欢迎。

You can use pandas' groupby to group by sample ID, and then use the transform method to get the first value per sample ID.您可以使用 pandas 的groupby按样本 ID 分组,然后使用transform方法获取每个样本 ID 的第一个值。 Note that this takes the first value by row number, not the first value by date, so make sure the rows are ordered by date.请注意,这将按行号获取第一个值,而不是按日期获取第一个值,因此请确保行按日期排序。

df = pd.DataFrame(
    [
        [1, "2020-07-21", 5],
        [1, "2020-07-22", 4],
        [1, "2020-07-23", 6],
        [2, "2020-07-26", 6],
        [2, "2020-07-28", 5],
        [3, "2020-07-22", 4],
        [3, "2020-07-27", 4],
        [3, "2020-07-30", 6],
    ],
    columns=["sample_id", "tested_at", "test_value"],
)

df["first_test"] = df.groupby("sample_id")["test_value"].transform("first")

Which results in:结果是:

   sample_id   tested_at  test_value  first_test
0          1  2020-07-21    5           5
1          1  2020-07-22    4           5
2          1  2020-07-23    6           5
3          2  2020-07-26    6           6
4          2  2020-07-28    5           6
5          3  2020-07-22    4           4
6          3  2020-07-27    4           4
7          3  2020-07-30    6           4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:在每个id的一列中首次出现值后删除行 - Pandas: drop rows after first occurrence of value in one column for each id pandas - 找到第一次出现 - pandas - find first occurrence 如何查找 Pandas 中每一行的哪一列首先满足条件? - How to find which column meets a criteria first for each row in Pandas? 根据具有共享列的另一个数据帧在数据帧中查找第一次出现的值 - Find first occurrence of value in dataframe based on another dataframe with a shared column 如何根据 Pandas 数据框中最后一次有效出现的值获取值 - How to get values based on each last valid occurrence in Pandas dataframe 如何使用 pandas 查找给定日期的 boolean 值的第一次出现? - How to find the first occurrence of a boolean value for a given day using pandas? 如何找到 pandas dataframe 值的第一次显着差异? - How to find first occurrence of a significant difference in values of a pandas dataframe? 使用 DateTime 索引在 Pandas DataFrame 中查找每天第一次和最后一次出现值的索引位置 - Find index location of first and last occurrence of a value per day in a Pandas DataFrame with a DateTime index 如何根据数据框中的日期时间列值查找列中每个唯一值的前一个值? - How to find previous of each unique value in a column based upon datetime column values in a dataframe? 按第一次出现的每个值分组 pandas dataframe - Group by first occurrence of each value in a pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM