繁体   English   中英

如何使用列表迭代 pandas dataframe

[英]How to iterate over a pandas dataframe using a list

我有以下pandas dataframe

import pandas as pd
foo = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
                              'tomato': [0.7, 0.4, 0.8],
                              'carrots': [0.15,0.3,0.1],
                               'cheese': [0.15,0.3,0.1]})

foo

source  tomato  carrots cheese
0   tomato  0.7 0.15    0.15
1   carrots 0.4 0.30    0.30
2   cheese  0.8 0.10    0.10

以及以下列表:

sequence = ['carrots','carrots','tomato','carrots']

sequence描述了顺序步骤: carrots -> carrots -> tomato -> carrots ,这意味着从胡萝卜到胡萝卜,然后从胡萝卜到番茄,最后从番茄到胡萝卜。

我想根据上述顺序迭代foo并计算foo['carrots']['carrots']*foo['carrots']['tomato']*foo['tomato']['carrots']

其中foo['carrots']['carrots']foo对应于source carrots和 column carrots的元素,所以0.3

我怎样才能有效地做到这一点?

也许你可以使用zip

In [1]: import pandas as pd
In [2]: foo = pd.DataFrame({
   ...:     'source': ['tomato', 'carrots', 'cheese'],
   ...:     'tomato': [0.7, 0.4, 0.8],
   ...:     'carrots': [0.15, 0.3, 0.1],
   ...:     'cheese': [0.15, 0.3, 0.1]
   ...: })
In [3]: foo
Out[3]: 
    source  tomato  carrots  cheese
0   tomato     0.7     0.15    0.15
1  carrots     0.4     0.30    0.30
2   cheese     0.8     0.10    0.10
In [4]: sequence = ['carrots', 'carrots', 'tomato', 'carrots']
   ...: product = 1
   ...: for row, col in zip(sequence[:-1], sequence[1:]):
   ...:     val = foo[foo.source == row].iloc[0][col]
   ...:     product *= val
   ...:     print(f'foo[{row}][{col}] = {val}')
   ...: 
   ...: 
foo[carrots][carrots] = 0.3
foo[carrots][tomato] = 0.4
foo[tomato][carrots] = 0.15
In [5]: product
Out[5]: 0.018

我创建MultiIndex Series ,然后从序列Series.reindex中配对并获取多个所有值的产品:

s = foo.set_index('source').stack()

print (s)
source          
tomato   tomato     0.70
         carrots    0.15
         cheese     0.15
carrots  tomato     0.40
         carrots    0.30
         cheese     0.30
cheese   tomato     0.80
         carrots    0.10
         cheese     0.10
dtype: float64

sequence = ['carrots','carrots','tomato','carrots']

print (list(zip(sequence[:-1], sequence[1:])))
[('carrots', 'carrots'), ('carrots', 'tomato'), ('tomato', 'carrots')]

print (s.reindex(list(zip(sequence[:-1], sequence[1:]))))
source          
carrots  carrots    0.30
         tomato     0.40
tomato   carrots    0.15
dtype: float64


out = s.reindex(list(zip(sequence[:-1], sequence[1:]))).prod()
print (out)
0.018

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM