简体   繁体   中英

Python Lambda groupby to function and Stop Iteration

I wrote a function that I am able to successfully use like this:

createdata(df)

The results are correct. For example the output includes 1 result row for each row in the data frame as expected and the calculations are correct. My problem is that due to memory limitations I cannot run the entire data frame through the function at the same time so I have to send the data frame through the function iteratively.

I cannot send the data frame to the function row by row because there is a rank by group aspect to the algorithm. So I have to send at least one group of data at a time to the function. I tried groupby.apply but there were unexpected results due to the apply calling the function twice on the first group. So now I am using a lambda like this.

df.groupby(["x", "y"]).apply(lambda x: createdata(df))

with this I am getting correct calculations but I am getting 4 identical rows of output for each input row. I am also getting the Stop Iteration exception when it finishes.

Without getting into the details of the function is there something that I can correct in my approach to simply iterate my function on one group of my data frame at a time?

I don't know zilch about Pandas, but from a quick glance at your code and at the doc, I think you want to pass x to createdata , not df :

 df.groupby(["x", "y"]).apply(lambda x: createdata(x))

Also, note that according to the doc:

In the current implementation apply calls func twice on the first group to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first group.

http://pandas.pydata.org/pandas-docs/version/0.15.1/groupby.html#flexible-apply

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM