您如何識別 Python dataframe 的另一列中哪些 ID 隨着時間的推移而增加值？

Question

假設我有一個包含 3 列的數據框：

| id | value |    date   |
+====+=======+===========+
|  1 |   50  |  1-Feb-19 |
+----+-------+-----------+
|  1 |  100  |  5-Feb-19 |
+----+-------+-----------+
|  1 |  200  |  6-Jun-19 |
+----+-------+-----------+
|  1 |  500  |  1-Dec-19 |
+----+-------+-----------+
|  2 |   10  |  6-Jul-19 |
+----+-------+-----------+
|  3 |  500  |  1-Mar-19 |
+----+-------+-----------+
|  3 |  200  |  5-Apr-19 |
+----+-------+-----------+
|  3 |  100  | 30-Jun-19 |
+----+-------+-----------+
|  3 |   10  | 25-Dec-19 |
+----+-------+-----------+

ID 列包含特定人員的 ID。 價值列包含他們的交易價值。 日期列包含他們的交易日期。

Python 中是否有辦法將 ID 1 識別為隨着時間的推移交易價值增加的 ID？

我正在尋找某種方法，我可以提取 ID 1 作為我想要的 ID，隨着交易價值的增加，過濾掉 ID 2，因為它沒有足夠的交易來分析趨勢，還過濾掉 ID 3，因為它的交易趨勢是隨着時間的推移而下降。

Answer 1

也許按 id 分組，並檢查排序值是否相同，無論是按值還是按日期排序：

>>> df.groupby('id').apply( lambda x:
...    (
...        x.sort_values('value', ignore_index=True)['value'] == x.sort_values('date', ignore_index=True)['value']
...    ).all()
... )
id
1     True
2     True
3    False
dtype: bool

編輯：

要使id=2不為真，我們可以這樣做：

>>> df.groupby('id').apply( lambda x:
...    (
...        (x.sort_values('value', ignore_index=True)['value'] == x.sort_values('date', ignore_index=True)['value'])
...        & (len(x) > 1)
...    ).all()
... )
id
1     True
2    False
3    False
dtype: bool

Answer 2

df['new'] = df.groupby(['id'])['value'].transform(lambda x : \
                      np.where(x.diff()>0,'incresase',
                      np.where(x.diff()<0,'decrease','--')))

df = df.groupby('id').new.agg(['last'])
df

Output：

      last
id  
1   increase
2   --
3   decrease

只增加ID：

increasingList = df[(df['last']=='increase')].index.values
print(increasingList)

結果：

[1]

假設這不會發生

1  50
1  100
1  50

如果是這樣，那么：

df['new'] = df.groupby(['id'])['value'].transform(lambda x : \
                      np.where(x.diff()>0,'increase',
                      np.where(x.diff()<0,'decrease','--')))
df

Output：

    value   new
id      
1   50  --
1   100 increase
1   200 increase
2   10  --
3   500 --
3   300 decrease
3   100 decrease

連接字符串：

df = df.groupby(['id'])['new'].apply(lambda x: ','.join(x)).reset_index()
df

中間結果：

    id  new
0   1   --,increase,increase
1   2   --
2   3   --,decrease,decrease

檢查是否連續存在減少/僅存在“--”。 放下它們

df = df.drop(df[df['new'].str.contains("dec")].index.values)
df = df.drop(df[(df['new']=='--')].index.values)
df

結果：

    id  new
0   1   --,increase,increase

您如何識別 Python dataframe 的另一列中哪些 ID 隨着時間的推移而增加值？

問題描述

2 個解決方案

解決方案1
2 2020-08-13 20:31:40

解決方案2
1 已采納 2020-08-13 20:28:46

您如何識別 Python dataframe 的另一列中哪些 ID 隨着時間的推移而增加值？

問題描述

2 個解決方案

解決方案1 2 2020-08-13 20:31:40

解決方案2 1 已采納 2020-08-13 20:28:46

解決方案1
2 2020-08-13 20:31:40

解決方案2
1 已采納 2020-08-13 20:28:46