[英]Pandas eval with multi-index dataframes
Consider a multi-index dataframe df
:考虑一个多索引数据帧
df
:
A bar flux
B one three six three
x 0.627915 0.507184 0.690787 1.166318
y 0.927342 0.788232 1.776677 -0.512259
z 1.000000 1.000000 1.000000 0.000000
I would like to use eval
to substract ('bar', 'one')
from ('flux', six')
.我想用
eval
从('flux', six')
减去('bar', 'one')
('flux', six')
。 Does the eval syntax support this type of index? eval 语法是否支持这种类型的索引?
You can do this without using eval
by using the equivalent standard Python notation: 您可以使用等效的标准Python表示法而不使用
eval
来执行此操作:
df['bar']['one'] - df['flux']['six']`
Take a look at this reference. 看看这个参考。 Below is an example for you, based off the object in your question:
以下是您的示例,基于您问题中的对象:
from pandas import DataFrame, MultiIndex
# Create the object
columns = [
('bar', 'one'),
('bar', 'three'),
('flux', 'six'),
('flux', 'three')
]
data = [
[0.627915, 0.507184, 0.690787, 1.166318],
[0.927342, 0.788232, 1.776677, -0.512259],
[1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)
# Calculate the difference
sub = df['bar']['one'] - df['flux']['six']
print sub
# Assign that difference to a new column in the object
df['new', 'col'] = sub
print df
The corresponding result is: 相应的结果是:
A bar flux new
B one three six three col
x 0.627915 0.507184 0.690787 1.166318 -0.062872
y 0.927342 0.788232 1.776677 -0.512259 -0.849335
z 1.000000 1.000000 1.000000 0.000000 0.000000
Here's an example of a work-around that allows you to use tuple indexing in the DataFrame eval function.下面是一个变通方法示例,它允许您在 DataFrame eval 函数中使用元组索引。 I know this is an old one, but I couldn't find a good answer to the original question.
我知道这是一个旧问题,但我找不到原始问题的好答案。
from pandas import DataFrame, MultiIndex
import re
LEVEL_DELIMITER = "___"
def tuples_to_str(t):
return LEVEL_DELIMITER.join(t)
def str_to_tuples(s):
return tuple(s.split(LEVEL_DELIMITER))
def flatten_mi_var_expression(e):
# Find match to multi-index variables and flatten
tuple_re = r'\(.*?,.*?\)'
for tuple_str in re.findall(tuple_re, e):
e = e.replace(tuple_str, tuples_to_str(eval(tuple_str)))
return e
# Create the object
columns = [
('bar', 'one'),
('bar', 'three'),
('flux', 'six'),
('flux', 'three')
]
data = [
[0.627915, 0.507184, 0.690787, 1.166318],
[0.927342, 0.788232, 1.776677, -0.512259],
[1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)
# Desired multi-index variable expression (using tuple indexes)
new_col = ('new', 'col')
mi_expression = f"{new_col} = {('flux', 'six')} + {('bar', 'one')}"
# Capture the original multi-index column object
mi_cols = df.columns
# Flatten the multi-index columns
df.columns = [LEVEL_DELIMITER.join(col) for col in df.columns.values]
# Convert multi-index variable expression to flattened indexing
flat_expression = flatten_mi_var_expression(mi_expression)
# Evaluate
df.eval(flat_expression, inplace=True)
# Append the new column to the original multi-index instance and assign to the DataFrame
df.columns = MultiIndex.from_tuples(mi_cols.tolist() + [new_col], names=mi_cols.names)
print(df)
This should provide the following.这应该提供以下内容。
A bar flux new
B one three six three col
x 0.627915 0.507184 0.690787 1.166318 1.318702
y 0.927342 0.788232 1.776677 -0.512259 2.704019
z 1.000000 1.000000 1.000000 0.000000 2.000000
Not sure how safe this is with using python eval (which really isn't needed), but this example seems to work.不确定使用 python eval(实际上不需要)的安全性,但这个例子似乎有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.