简体   繁体   English

在熊猫数据框中创建 ID 列

[英]Create ID column in a pandas dataframe

I have a dataframe containing a trading log.我有一个包含交易日志的数据框。 My problem is that I do not have any ID to match buy and sell of a stock.我的问题是我没有任何 ID 来匹配股票的买卖。 The stock could be traded many times and I would like to have an ID to match each finished trade.股票可以多次交易,我想有一个 ID 来匹配每笔完成的交易。 My original dataframe a sequential timeseries dataframe with timestamps.我的原始数据帧是一个带有时间戳的顺序时间序列数据帧。 The below example illustrates my problem, I need to match and ID traded stock in sequential order.下面的例子说明了我的问题,我需要按顺序匹配和 ID 交易的股票。 Very simplified example:非常简单的例子:

df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                        'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell']}) 
df1
Out[84]: 
  stock  deal
0     A   buy
1     B   buy
2     C   buy
3     A  sell
4     C  sell
5     A   buy
6     A  sell   
    

Here is my desired output:这是我想要的输出:

df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                    'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell'],
                    'ID': [1, 2, 3, 1,3, 4, 4]}) 


df1
Out[82]: 
  stock  deal  ID
0     A   buy   1
1     B   buy   2
2     C   buy   3
3     A  sell   1
4     C  sell   3
5     A   buy   4
6     A  sell   4

Any ideas?有任何想法吗?

Try this:尝试这个:

m = df1['deal'] == 'buy'
df1['ID'] = m.cumsum().where(m)
df1['ID'] = df1.groupby('stock')['ID'].ffill()

df1

Output:输出:

  stock  deal   ID
0     A   buy  1.0
1     B   buy  2.0
2     C   buy  3.0
3     A  sell  1.0
4     C  sell  3.0
5     A   buy  4.0
6     A  sell  4.0

Details:细节:

  • Create a boolean series, True where deal equals 'buy'创建一个布尔系列,True,其中交易等于“购买”
  • Cumsum and assign to 'ID' to buy records Cumsum 并分配给“ID”以购买记录
  • Use groupby and ffill to assign 'ID' to next 'sell' record buy 'stock'使用 groupby 和 ffill 将“ID”分配给下一个“出售”记录购买“股票”

Try This:尝试这个:

import pandas as pd
df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                'deal': ['buy', 'buy', 'buy', 'sell','sell', 'buy', 'sell']})

def sequential_buy_sell_id_generator(df1):

    column_length = len(df1["stock"])
    found = [0]*column_length
    id = [0]*column_length

    counter = 0

    for row_pointer_head in range(column_length):
        if df1["deal"][row_pointer_head]=="buy":
            id[row_pointer_head]= counter
            counter+=1
            found[row_pointer_head] = 1
            id[row_pointer_head]= counter

            for row_pointer_tail in range(row_pointer_head+1, column_length):

                if df1["stock"][row_pointer_head]== df1["stock"][row_pointer_tail] and df1["deal"][row_pointer_tail] =="sell" and found[row_pointer_tail] == 0:
                    found[row_pointer_tail] = 1
                    id[row_pointer_tail]= counter
                    break

    df1 = df1.assign(id = id) 
    return df1


print(sequential_buy_sell_id_generator(df1))

Output:输出:

enter code here
    stock  deal  id
0     A   buy   1
1     B   buy   2
2     C   buy   3
3     A  sell   1
4     C  sell   3
5     A   buy   4
6     A  sell   4

Another Example:另一个例子:

For df1 = pd.DataFrame({'stock': ['A', 'B', 'C', 'A','C', 'A', 'A'],
                'deal': ['buy', 'buy', 'buy', 'buy','sell', 'sell', 'sell']})
  stock deal    ID
0   A   buy     1
1   B   buy     2
2   C   buy     3
3   A   buy     4
4   C   sell    3
5   A   sell    1
6   A   sell    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM