獲取由pandas中的另一列排序的列分區的第一次出現

Question

我的示例代碼：

import pandas as pd
df = pd.DataFrame({"ID":['1','1','1','2','2'],
                   "LINE":['1','3','2','1','2'],
                   "TYPE":['0','1','1','1','0']})
# print results
print(df.head())

# a function to label the first type 1 for each ID sorted by line
# currently it only filters to type 1
def label (row):
    if row.TYPE == '1' :
        return True

# add the label in the dataframe
df['label'] = df.apply (lambda row: label(row), axis=1)

# print results
print(df.head())

對於按LINE排序的每個唯一ID我想第一次出現TYPE == 1 。 最終結果應該是：

  ID LINE TYPE label
0  1    1    0  None
1  1    3    1  None
2  1    2    1  True
3  2    1    1  True
4  2    2    0  None

我在這個問題中使用了一個示例，但我實際上正在處理 300 萬個數據行，並且想知道最有效的方法來做到這一點。

Answer 1

使用query過濾TYPE == 1 ， sort_values對LINE進行排序，最后使用GroupBy.head來獲得第一次出現：

s = df.query('TYPE == "1"').sort_values('LINE').groupby('ID')['TYPE'].head(1)
df['label'] = df.index.isin(s.index)

或者使用drop_duplicates ，這應該更有效：

s = df.query('TYPE == "1"').sort_values('LINE').drop_duplicates('ID')
df['label'] = df.index.isin(s.index)

  ID LINE TYPE  label
0  1    1    0  False
1  1    3    1  False
2  1    2    1   True
3  2    1    1   True
4  2    2    0  False

獲取由pandas中的另一列排序的列分區的第一次出現

問題描述

1 個解決方案

解決方案1
3 已采納 2020-03-30 23:02:47

獲取由pandas中的另一列排序的列分區的第一次出現

問題描述

1 個解決方案

解決方案1 3 已采納 2020-03-30 23:02:47

解決方案1
3 已采納 2020-03-30 23:02:47