在熊猫数据框中创建 ID 列

Question

I have a dataframe containing a trading log.我有一个包含交易日志的数据框。 My problem is that I do not have any ID to match buy and sell of a stock.我的问题是我没有任何 ID 来匹配股票的买卖。 The stock could be traded many times and I would like to have an ID to match each finished trade.股票可以多次交易，我想有一个 ID 来匹配每笔完成的交易。 My original dataframe a sequential timeseries dataframe with timestamps.我的原始数据帧是一个带有时间戳的顺序时间序列数据帧。 The below example illustrates my problem, I need to match and ID traded stock in sequential order.下面的例子说明了我的问题，我需要按顺序匹配和 ID 交易的股票。 Very simplified example:非常简单的例子：

df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                        'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell']}) 
df1
Out[84]: 
  stock  deal
0     A   buy
1     B   buy
2     C   buy
3     A  sell
4     C  sell
5     A   buy
6     A  sell

Here is my desired output:这是我想要的输出：

df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                    'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell'],
                    'ID': [1, 2, 3, 1,3, 4, 4]}) 


df1
Out[82]: 
  stock  deal  ID
0     A   buy   1
1     B   buy   2
2     C   buy   3
3     A  sell   1
4     C  sell   3
5     A   buy   4
6     A  sell   4

Any ideas?有任何想法吗？

Answer 1

Try this:尝试这个：

m = df1['deal'] == 'buy'
df1['ID'] = m.cumsum().where(m)
df1['ID'] = df1.groupby('stock')['ID'].ffill()

df1

Output:输出：

  stock  deal   ID
0     A   buy  1.0
1     B   buy  2.0
2     C   buy  3.0
3     A  sell  1.0
4     C  sell  3.0
5     A   buy  4.0
6     A  sell  4.0

Details:细节：

Create a boolean series, True where deal equals 'buy'创建一个布尔系列，True，其中交易等于“购买”
Cumsum and assign to 'ID' to buy records Cumsum 并分配给“ID”以购买记录
Use groupby and ffill to assign 'ID' to next 'sell' record buy 'stock'使用 groupby 和 ffill 将“ID”分配给下一个“出售”记录购买“股票”

Answer 2

Try This:尝试这个：

import pandas as pd
df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell']})

def sequential_buy_sell_id_generator(df1):

    column_length = len(df1["stock"])
    found = [0]*column_length
    id = [0]*column_length

    counter = 0

    for row_pointer_head in range(column_length):
        if df1["deal"][row_pointer_head]=="buy":
            id[row_pointer_head]= counter
            counter+=1
            found[row_pointer_head] = 1
            id[row_pointer_head]= counter

            for row_pointer_tail in range(row_pointer_head+1, column_length):

                if df1["stock"][row_pointer_head]== df1["stock"][row_pointer_tail] and df1["deal"][row_pointer_tail] =="sell" and found[row_pointer_tail] == 0:
                    found[row_pointer_tail] = 1
                    id[row_pointer_tail]= counter
                    break

    df1 = df1.assign(id = id) 
    return df1


print(sequential_buy_sell_id_generator(df1))

Output:输出：

enter code here
    stock  deal  id
0     A   buy   1
1     B   buy   2
2     C   buy   3
3     A  sell   1
4     C  sell   3
5     A   buy   4
6     A  sell   4

Another Example:另一个例子：

For df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                'deal': ['buy', 'buy', 'buy', 'buy','sell', 'sell', 'sell']})
  stock deal    ID
0   A   buy     1
1   B   buy     2
2   C   buy     3
3   A   buy     4
4   C   sell    3
5   A   sell    1
6   A   sell    4

在熊猫数据框中创建 ID 列

问题描述

2 个解决方案

解决方案1
3 2020-11-16 21:11:19

解决方案2
1 2020-11-16 21:45:36

在熊猫数据框中创建 ID 列

问题描述

2 个解决方案

解决方案1 3 2020-11-16 21:11:19

解决方案2 1 2020-11-16 21:45:36

解决方案1
3 2020-11-16 21:11:19

解决方案2
1 2020-11-16 21:45:36