[英]Iterating through MultiIndex data in python pandas
I want to be able to iterate through a pandas DataFrame with grouping on a multi-index. 我希望能够通过对多索引进行分组来遍历pandas DataFrame。 Here, I'd like to be able to process a group of rows in each industry all together.
在这里,我希望能够一起处理每个行业中的一组行。 I load with a multi-index.
我加载了多索引。
from StringIO import StringIO
data = """industry,location,number
retail,brazil,294
technology,china,100
retail,nyc,2913
retail,paris,382
technology,us,2182
"""
df = pd.read_csv(StringIO(data), sep=",", index_col=['industry', 'location'])
So I wish there was something to this effect: 所以我希望能有一些效果:
for industry, rows in df.iter_multiindex():
for row in rows:
process_row(row)
Is there such a way to do this? 有这种方法吗?
You can groupby the first level of the multi-index (the industries), and then iterate trough the groups: 您可以按多索引的第一级(行业)分组,然后遍历各组:
In [102]: for name, group in df.groupby(level='industry'):
.....: print name, '\n', group, '\n'
.....:
retail
number
industry location
retail brazil 294
nyc 2913
paris 382
technology
number
industry location
technology china 100
us 2182
group
will be each time a dataframe, and you can then iterate through that (with eg for row in group.iterrows()
. group
每次是一个数据框,然后可以遍历该数据for row in group.iterrows()
例如, for row in group.iterrows()
。
But , in most cases such iteration is not needed! 但是 ,在大多数情况下,不需要这种迭代! What would
process_row
entail? process_row
需要什么? Probably you can do this in a vectorized manner, directly on the groupby object. 可能您可以直接在groupby对象上以矢量化方式执行此操作。
not sure why do you want to do this, but you can do it like this: 不确定为什么要这样做,但是可以这样:
for x in df.index:
print x[0] # industry
process(df.loc[x]) # row
But it's not how you usually work with DataFrame, you probably want to read about apply()
( Essential Basic Functionality
is also really helpful) 但这不是您通常使用DataFrame的方式,您可能想阅读有关
apply()
( Essential Basic Functionality
也很有帮助)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.