[英]Pandas eval with multi-index dataframes
考慮一個多索引數據幀df
:
A bar flux
B one three six three
x 0.627915 0.507184 0.690787 1.166318
y 0.927342 0.788232 1.776677 -0.512259
z 1.000000 1.000000 1.000000 0.000000
我想用eval
從('flux', six')
減去('bar', 'one')
('flux', six')
。 eval 語法是否支持這種類型的索引?
您可以使用等效的標准Python表示法而不使用eval
來執行此操作:
df['bar']['one'] - df['flux']['six']`
看看這個參考。 以下是您的示例,基於您問題中的對象:
from pandas import DataFrame, MultiIndex
# Create the object
columns = [
('bar', 'one'),
('bar', 'three'),
('flux', 'six'),
('flux', 'three')
]
data = [
[0.627915, 0.507184, 0.690787, 1.166318],
[0.927342, 0.788232, 1.776677, -0.512259],
[1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)
# Calculate the difference
sub = df['bar']['one'] - df['flux']['six']
print sub
# Assign that difference to a new column in the object
df['new', 'col'] = sub
print df
相應的結果是:
A bar flux new
B one three six three col
x 0.627915 0.507184 0.690787 1.166318 -0.062872
y 0.927342 0.788232 1.776677 -0.512259 -0.849335
z 1.000000 1.000000 1.000000 0.000000 0.000000
下面是一個變通方法示例,它允許您在 DataFrame eval 函數中使用元組索引。 我知道這是一個舊問題,但我找不到原始問題的好答案。
from pandas import DataFrame, MultiIndex
import re
LEVEL_DELIMITER = "___"
def tuples_to_str(t):
return LEVEL_DELIMITER.join(t)
def str_to_tuples(s):
return tuple(s.split(LEVEL_DELIMITER))
def flatten_mi_var_expression(e):
# Find match to multi-index variables and flatten
tuple_re = r'\(.*?,.*?\)'
for tuple_str in re.findall(tuple_re, e):
e = e.replace(tuple_str, tuples_to_str(eval(tuple_str)))
return e
# Create the object
columns = [
('bar', 'one'),
('bar', 'three'),
('flux', 'six'),
('flux', 'three')
]
data = [
[0.627915, 0.507184, 0.690787, 1.166318],
[0.927342, 0.788232, 1.776677, -0.512259],
[1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)
# Desired multi-index variable expression (using tuple indexes)
new_col = ('new', 'col')
mi_expression = f"{new_col} = {('flux', 'six')} + {('bar', 'one')}"
# Capture the original multi-index column object
mi_cols = df.columns
# Flatten the multi-index columns
df.columns = [LEVEL_DELIMITER.join(col) for col in df.columns.values]
# Convert multi-index variable expression to flattened indexing
flat_expression = flatten_mi_var_expression(mi_expression)
# Evaluate
df.eval(flat_expression, inplace=True)
# Append the new column to the original multi-index instance and assign to the DataFrame
df.columns = MultiIndex.from_tuples(mi_cols.tolist() + [new_col], names=mi_cols.names)
print(df)
這應該提供以下內容。
A bar flux new
B one three six three col
x 0.627915 0.507184 0.690787 1.166318 1.318702
y 0.927342 0.788232 1.776677 -0.512259 2.704019
z 1.000000 1.000000 1.000000 0.000000 2.000000
不確定使用 python eval(實際上不需要)的安全性,但這個例子似乎有效。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.