简体   繁体   English

熊猫:建立价格差异矩阵?

[英]Pandas: Create a matrix of price differences?

I am trying to build price differences between bitcoins and exchanges, for example I have a dataframe, 我正在尝试建立比特币和交易所之间的价格差异,例如,我有一个数据框,

    Exchange coin           lastUpdate    price   volume
0   Bitfinex  BTC  2019-06-23 06:23:27    10646  24299.4
1   Bitfinex  ETH  2019-06-23 06:23:13   308.47   225945
2   Bitfinex  LTC  2019-06-23 06:23:18   140.41   215698
3   Bitstamp  BTC  2019-06-23 06:23:21  10546.4  9620.04
4   Bitstamp  ETH  2019-06-23 06:22:48   305.15  46062.6
5   Bitstamp  LTC  2019-06-23 06:22:46   139.22  85160.5
6     CCCAGG  BTC  2019-06-23 06:23:23  10580.4  79049.8
7     CCCAGG  ETH  2019-06-23 06:23:20   306.74   681056
8     CCCAGG  LTC  2019-06-23 06:23:24   139.71   752875
9   Coinbase  BTC  2019-06-23 06:23:17  10557.5  23731.2
10  Coinbase  ETH  2019-06-23 06:23:11   306.09   247213
11  Coinbase  LTC  2019-06-23 06:23:13   139.49   381421

And I am trying to get all of the prices differences between the coin and all the exchanges it trades on, 而且我正在尝试获得代币与其所交易的所有交易所之间的所有价格差异,

I want it to look like, 我希望它看起来像

price_combos                        diff
Price Diff: BTC - Bitfinex-Bitstamp 14.06
Price Diff: BTC - Bitfinex-CCCAGG   14.32
Price Diff: BTC - Bitstamp-CCCAGG   0.26
Price Diff: BTC - Coinbase-Bitfinex -17.99
Price Diff: BTC - Coinbase-Bitstamp -3.93
Price Diff: BTC - Coinbase-CCCAGG   -3.67

And then repeat for each coin. 然后重复每个硬币。

Edit: Added price to combinations, note that the diff is from a different set of data so it won't match the actual diff from the first dataframe. 编辑:将价格添加到组合中,请注意,差异来自另一组数据,因此它与第一个数据帧的实际差异不匹配。

We can approach this problem as following: 我们可以通过以下方法解决此问题:

  1. We do an outer merge on each coin itself so it gives us all the combinations back. 我们对每个硬币本身进行outer merge ,以便将所有组合返还给我们。
  2. We filter out the rows with ne (not equal) where the exchange is the same (we don't want to compare those). 我们用ne (不等于)过滤掉交换相同的行(我们不想比较它们)。
  3. Create our Price diff column by subtracting the prices 通过减去价格来创建我们的Price diff
# Step 1 outer merge
df2 = df[['Exchange', 'coin', 'price']].merge(df[['Exchange', 'coin', 'price']], 
                                              on='coin', 
                                              how='outer', 
                                              suffixes=['', '_2'])

# Step 2 filter out same exchange
df2 = df2[df2['Exchange'].ne(df2['Exchange_2'])]

# Step 3 create Price Diff column
df2['Price Diff'] = df2['price'] = df2['price_2']

    Exchange coin     price Exchange_2   price_2  Price Diff
1   Bitfinex  BTC  10546.40   Bitstamp  10546.40    10546.40
2   Bitfinex  BTC  10580.40     CCCAGG  10580.40    10580.40
3   Bitfinex  BTC  10557.50   Coinbase  10557.50    10557.50
4   Bitstamp  BTC  10646.00   Bitfinex  10646.00    10646.00
6   Bitstamp  BTC  10580.40     CCCAGG  10580.40    10580.40
7   Bitstamp  BTC  10557.50   Coinbase  10557.50    10557.50
8     CCCAGG  BTC  10646.00   Bitfinex  10646.00    10646.00
9     CCCAGG  BTC  10546.40   Bitstamp  10546.40    10546.40
11    CCCAGG  BTC  10557.50   Coinbase  10557.50    10557.50
12  Coinbase  BTC  10646.00   Bitfinex  10646.00    10646.00
13  Coinbase  BTC  10546.40   Bitstamp  10546.40    10546.40
14  Coinbase  BTC  10580.40     CCCAGG  10580.40    10580.40
17  Bitfinex  ETH    305.15   Bitstamp    305.15      305.15
18  Bitfinex  ETH    306.74     CCCAGG    306.74      306.74
19  Bitfinex  ETH    306.09   Coinbase    306.09      306.09
20  Bitstamp  ETH    308.47   Bitfinex    308.47      308.47
22  Bitstamp  ETH    306.74     CCCAGG    306.74      306.74
23  Bitstamp  ETH    306.09   Coinbase    306.09      306.09
24    CCCAGG  ETH    308.47   Bitfinex    308.47      308.47
25    CCCAGG  ETH    305.15   Bitstamp    305.15      305.15
27    CCCAGG  ETH    306.09   Coinbase    306.09      306.09
28  Coinbase  ETH    308.47   Bitfinex    308.47      308.47
29  Coinbase  ETH    305.15   Bitstamp    305.15      305.15
30  Coinbase  ETH    306.74     CCCAGG    306.74      306.74
33  Bitfinex  LTC    139.22   Bitstamp    139.22      139.22
34  Bitfinex  LTC    139.71     CCCAGG    139.71      139.71
35  Bitfinex  LTC    139.49   Coinbase    139.49      139.49
36  Bitstamp  LTC    140.41   Bitfinex    140.41      140.41
38  Bitstamp  LTC    139.71     CCCAGG    139.71      139.71
39  Bitstamp  LTC    139.49   Coinbase    139.49      139.49
40    CCCAGG  LTC    140.41   Bitfinex    140.41      140.41
41    CCCAGG  LTC    139.22   Bitstamp    139.22      139.22
43    CCCAGG  LTC    139.49   Coinbase    139.49      139.49
44  Coinbase  LTC    140.41   Bitfinex    140.41      140.41
45  Coinbase  LTC    139.22   Bitstamp    139.22      139.22
46  Coinbase  LTC    139.71     CCCAGG    139.71      139.71

You should have a look at the itertools module (doc) . 您应该看看itertools模块(doc) There are a lot of nice functions for iterations. 有很多不错的迭代功能。

Here you're exactly looking for the combination function. 在这里,您正在寻找combination功能。

Once you have the combinations, that becomes simple: 一旦有了组合,就变得很简单:

# Import modules
import pandas as pd
import itertools as iter

# Your data
df = pd.DataFrame([
    ["Bitfinex",  "BTC", "2019-06-23 06:23:27",  10646, 24299.4],
    ["Bitfinex",  "ETH", "2019-06-23 06:23:13",  308.47,  225945],
    ["Bitfinex",  "LTC", "2019-06-23 06:23:18",  140.41,  215698],
    ["Bitstamp",  "BTC", "2019-06-23 06:23:21", 10546.4, 9620.04],
    ["Bitstamp",  "ETH", "2019-06-23 06:22:48",  305.15, 46062.6],
    ["Bitstamp", "LTC", "2019-06-23 06:22:46", 139.22, 85160.5],
    ["CCCAGG",  "BTC", "2019-06-23 06:23:23", 10580.4, 79049.8],
    ["CCCAGG",  "ETH", "2019-06-23 06:23:20", 306.74,  681056],
    ["CCCAGG",  "LTC", "2019-06-23 06:23:24", 139.71, 752875],
    ["Coinbase",  "BTC", "2019-06-23 06:23:17", 10557.5, 23731.2],
    ["Coinbase", "ETH", "2019-06-23 06:23:11", 306.09, 247213],
    ["Coinbase", "LTC", "2019-06-23 06:23:13", 139.49,  381421],
], columns=["Exchange", "coin", "lastUpdate", "price", "volume"])


# Print all combinations for one coin
def print_combi(df, coin):
    # subset dataframe with matching rows
    sub_df = df[df["coin"] == coin]
    # Create all combinations for the exchange columns
    list_combi = [cb for cb in iter.combinations(sub_df.Exchange, 2)]

    # Print the expected output
    for combi in list_combi:
        print("Price diff: {0} - {1}-{2}".format(coin, combi[0], combi[1]))

print_combi(df, 'BTC')
# Price diff: BTC - Bitfinex-Bitstamp
# Price diff: BTC - Bitfinex-CCCAGG
# Price diff: BTC - Bitfinex-Coinbase
# Price diff: BTC - Bitstamp-CCCAGG
# Price diff: BTC - Bitstamp-Coinbase
# Price diff: BTC - CCCAGG-Coinbase

EDIT1: EDIT1:

Return a dataframe. 返回一个数据框。 The diff column is from the data used in the snippet above. diff列来自上面代码段中使用的数据。

def combo_money_df(df, coin):
    # subset the dataframe
    sub_df = df[df["coin"] == coin]

    new_data = []
    # For each subset
    for combi in iter.combinations(sub_df.index, 2):
        # Select corresponding row
        row_1 = sub_df.loc[combi[0]]
        row_2 = sub_df.loc[combi[1]]
        # Create new rows
        new_data.append([row_1.Exchange + "-" + row_2.Exchange, row_1.price - row_2.price])
    # Return a dataframe object
    return pd.DataFrame(new_data, columns=["price_combo", "diff"])

print(combo_money_df(df, "BTC"))
#          price_combo  diff
# 0  Bitfinex-Bitstamp  99.6
# 1    Bitfinex-CCCAGG  65.6
# 2  Bitfinex-Coinbase  88.5
# 3    Bitstamp-CCCAGG -34.0
# 4  Bitstamp-Coinbase -11.1
# 5    CCCAGG-Coinbase  22.9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM