pandas 在一個單元格中取兩個字符串值的平均值

Question

下面的示例有一些買/賣價。 計算整個 df 的每個單元格的平均值（中間值）的好方法是什么？

#---sample df
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

例如，下面將返回0.645 。

prices.iat[0,1]
Out[112]: '0.67/0.62'

Answer 1

您可以在/上拆分所有這些，然后取平均值。 首先將非數字列設置為索引允許您使用applymap一次執行 df 的整個 rest。

import numpy as np
import pandas as pd
prices = pd.DataFrame({
    'tenor':['5Y', '10Y', '15Y', '20Y', '30Y'],
    '1M':['0.67/0.62', '1.10/1.05', '1.23/1.18', '1.38/1.33', '1.55/1.50'],
    '3M':['0.79/0.74', '1.19/1.14', '1.32/1.27', '1.49/1.44', '1.65/1.60'],
    '6M':['0.89/0.84', '1.29/1.24', '1.42/1.37', '1.60/1.55', '1.76/1.71'],
    '12M':['1.14/1.07', '1.47/1.40', '1.61/1.54', '1.80/1.72', '1.95/1.87']
    })

prices = prices.set_index('tenor').applymap(lambda x: np.mean(list(map(float,x.split('/'))))).reset_index()

Output

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

Answer 2

對於每一列，您可以通過/拆分字符串並運行 lambda 操作以獲得平均值

prices["1M"].str.split('/').apply(lambda x : (float(x[0])+float(x[1]))/2)

0    0.645
1    1.075
2    1.205
3    1.355
4    1.525
Name: 1M, dtype: float64

Answer 3

這是另一個解決方案：

x = prices.iloc[:,1:].unstack().swaplevel(1,0).str.split('/').explode().astype(float)
temp1 = x.groupby(x.index).mean().reindex(pd.MultiIndex.from_tuples(x.index.drop_duplicates()))
prices.iloc[:,1:] = temp1.unstack()[prices.iloc[:,1:].columns]

Output：

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575   1.76
4   30Y  1.525  1.625  1.735   1.91

Answer 4

雖然applymap既好又簡單，但不幸的是這很慢。

更有效的矢量解決方案是在explode + mean之前split和groupby ：

(prices.set_index('tenor')
       .apply(lambda c: c.str.split('/').explode())
       .astype(float)
       .groupby(level=0, sort=False).mean()
)

Output：

          1M     3M     6M    12M
tenor                            
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

這在 50k 行上快了約 8 倍

Answer 5

另一種選擇，以避免破壞數據，這可能有助於提高性能：

temp = prices.set_index('tenor').transform(lambda df: df.str.split('/'))
A = temp.transform(lambda df: pd.to_numeric(df.str[0])) 
B = temp.transform(lambda df: pd.to_numeric(df.str[-1]))

A.add(B).div(2)

         1M     3M     6M    12M
tenor
5Y     0.645  0.765  0.865  1.105
10Y    1.075  1.165  1.265  1.435
15Y    1.205  1.295  1.395  1.575
20Y    1.355  1.465  1.575  1.760
30Y    1.525  1.625  1.735  1.910

當然，如果您有更多條目，那么explode 是一個更好的選擇。

另一個可以很好擴展的選項是在 Pandas 中進行最終處理之前，在 vanilla python 中進行字符串工作。 我們將利用 Pandas 的 MultiIndexing 來獲得最終的 output：

reshaped = pd.concat({key : pd.DataFrame(string.split('/') 
                                          for string in ent)  
                       for key, ent 
                       in prices.drop(columns='tenor').items()}, 
                       axis = 1)

(reshaped
  .astype(float)
  .groupby(level=0,axis = 1, sort = False)
  .mean(1) 
  .assign(tenor = prices.tenor)
   # you can ignore the line below,
   # if column order is not important
  .loc[:, [*prices]]
)

  tenor     1M     3M     6M    12M
0    5Y  0.645  0.765  0.865  1.105
1   10Y  1.075  1.165  1.265  1.435
2   15Y  1.205  1.295  1.395  1.575
3   20Y  1.355  1.465  1.575  1.760
4   30Y  1.525  1.625  1.735  1.910

同樣，這里的目標是嘗試而不是炸毀 dataframe，並希望獲得更多性能。 您應該通過在 Python （Pandas str 方法建立在 Python 的字符串方法之上）中的字符串重塑來獲得更高的性能。 與往常一樣，只有測試才能說明性能。

pandas 在一個單元格中取兩個字符串值的平均值

問題描述

5 個解決方案

解決方案1
1 2021-12-22 04:10:09

解決方案2
0 2021-12-22 04:11:02

解決方案3
0 2021-12-22 04:16:04

解決方案4
0 2021-12-22 04:46:15

解決方案5
0 2021-12-22 07:04:37

pandas 在一個單元格中取兩個字符串值的平均值

問題描述

5 個解決方案

解決方案1 1 2021-12-22 04:10:09

解決方案2 0 2021-12-22 04:11:02

解決方案3 0 2021-12-22 04:16:04

解決方案4 0 2021-12-22 04:46:15

解決方案5 0 2021-12-22 07:04:37

解決方案1
1 2021-12-22 04:10:09

解決方案2
0 2021-12-22 04:11:02

解決方案3
0 2021-12-22 04:16:04

解決方案4
0 2021-12-22 04:46:15

解決方案5
0 2021-12-22 07:04:37