簡體   English   中英

帶有多索引數據框的 Pandas eval

[英]Pandas eval with multi-index dataframes

考慮一個多索引數據幀df

A       bar                flux          
B       one     three       six     three
x  0.627915  0.507184  0.690787  1.166318
y  0.927342  0.788232  1.776677 -0.512259
z  1.000000  1.000000  1.000000  0.000000

我想用eval('flux', six')減去('bar', 'one') ('flux', six') eval 語法是否支持這種類型的索引?

您可以使用等效的標准Python表示法而不使用eval來執行此操作:

df['bar']['one'] - df['flux']['six']`

看看這個參考。 以下是您的示例,基於您問題中的對象:

from pandas import DataFrame, MultiIndex

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data    = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index   = MultiIndex.from_tuples(columns, names=['A', 'B'])
df      = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Calculate the difference
sub = df['bar']['one'] - df['flux']['six']
print sub

# Assign that difference to a new column in the object
df['new', 'col'] = sub
print df

相應的結果是:

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318 -0.062872
y  0.927342  0.788232  1.776677 -0.512259 -0.849335
z  1.000000  1.000000  1.000000  0.000000  0.000000

下面是一個變通方法示例,它允許您在 DataFrame eval 函數中使用元組索引。 我知道這是一個舊問題,但我找不到原始問題的好答案。

from pandas import DataFrame, MultiIndex
import re

LEVEL_DELIMITER = "___"

def tuples_to_str(t):
    return LEVEL_DELIMITER.join(t)

def str_to_tuples(s):
    return tuple(s.split(LEVEL_DELIMITER))

def flatten_mi_var_expression(e):
    # Find match to multi-index variables and flatten
    tuple_re = r'\(.*?,.*?\)'
    for tuple_str in re.findall(tuple_re, e):
        e = e.replace(tuple_str, tuples_to_str(eval(tuple_str)))
    return e

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Desired multi-index variable expression (using tuple indexes)
new_col = ('new', 'col')
mi_expression = f"{new_col} = {('flux', 'six')} + {('bar', 'one')}"

# Capture the original multi-index column object
mi_cols = df.columns

# Flatten the multi-index columns
df.columns = [LEVEL_DELIMITER.join(col) for col in df.columns.values]

# Convert multi-index variable expression to flattened indexing
flat_expression = flatten_mi_var_expression(mi_expression)

# Evaluate
df.eval(flat_expression, inplace=True)

# Append the new column to the original multi-index instance and assign to the DataFrame
df.columns = MultiIndex.from_tuples(mi_cols.tolist() + [new_col], names=mi_cols.names)

print(df)

這應該提供以下內容。

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318  1.318702
y  0.927342  0.788232  1.776677 -0.512259  2.704019
z  1.000000  1.000000  1.000000  0.000000  2.000000

不確定使用 python eval(實際上不需要)的安全性,但這個例子似乎有效。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM