[英]How do I iterate over DataFrame Groupby after applying size()?
Combing thru log files I build a dataframe of the process that failed, the date and the machine. 通过日志文件组合,我建立了失败进程,日期和机器的数据框。 My goal is to provide a bar chart for each process where the dates are the x-axis and the count of failures each day is computed with .size(). 我的目标是为每个过程提供一个条形图,其中日期为x轴,每天的失败计数是使用.size()计算的。
grouped = fail_df.groupby(['Process', 'Date']).size
print(fail_df.groupby(['Process', 'Date']).size())
shows exactly what I want. 恰好显示了我想要的。 First lines of the print are 打印的第一行是
Process Date
10HzTail 2019-06-16 1
1553Prox 2019-06-16 3
2019-06-17 8
2019-06-18 10
2019-06-19 2
2019-06-20 5
Cthread2 2019-06-18 1
2019-06-20 1
I try to iterate as 我尝试迭代为
for name, row in grouped:
print(name)
print(row)
Gives this error output 给出此错误输出
dtype: int64
Traceback (most recent call last):
File "./allpandas", line 140, in <module>
main()
File "./allpandas", line 125, in main
for name, row in grouped:
TypeError: 'int' object is not iterable
I would want to process each Process in turn. 我想依次处理每个流程。 I want the dates and count fed to the bar chart for that Process. 我希望将日期和计数反馈到该流程的条形图中。
Is there a way to iterate over this or have I made a fundamental mistake in my grouping? 有没有办法对此进行迭代,或者我在分组中犯了一个根本性的错误?
UPDATE 更新
I tried the suggested size() and still get the same error. 我尝试了建议的size(),但仍然遇到相同的错误。
grouped = fail_df.groupby(['Process', 'Date']).size()
for name, row in grouped:
print(name)
print(row)
Are there other suggestions? 还有其他建议吗?
Are you using matplotlib
? 您正在使用matplotlib
吗?
If so, if I understood what you want, you don't need to loop, you can use pandas.DataFrame.plot which does all the job for you. 如果是这样,如果我理解了您想要的内容,则无需循环,可以使用pandas.DataFrame.plot为您完成所有工作。
grouped = fail_df.groupby(['Process', 'Date']).size()
axis = grouped.plot(kind='bar')
plt.show()
Where plt
is the usual import matplotlib.pyplot as plt
. 其中plt
是通常的import matplotlib.pyplot as plt
。
You may need to fix the label at the bottom of each bar, if they are too large. 如果标签太大,则可能需要将其固定在每个条的底部。
Your error comes from the fact that grouped
is a Series
and not a DataFrame
, so you cannot iterate over it that way. 您的错误来自于grouped
是Series
而不是DataFrame
,因此您不能以这种方式对其进行迭代。 Iterating over a series returns only the value. 迭代序列仅返回值。 You should do: 你应该做:
for value in grouped:
print(value)
to see the sizes, but you lose the index label. 查看尺寸,但是您丢失了索引标签。 To get also the index label the solution is: 要获得索引标签,解决方案是:
for name, row in zip(grouped.index, grouped):
print(name)
print(row)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.