简体   繁体   中英

Extracting/appending pandas dataframe rows which meet a complex condition involving multiple columns

I'm having trouble understanding how looping through a dataframe works.

I found somewhere that if you write:

for row in df.iterrows()

you wont be able to access row['column1'], instead youll have to use

for row,index in df.iterrows() and then it works.

Now i want to create a collection of signals I found in the loop by adding row to a new dataframe newdf.append(row) this works but it looses the ability to be referenced by a string. How do i have to add those rows to my dataframe in order for that to work?

Detailed code:

dataframe1 = DataFrame(np.random.randn(10, 5), columns=['a','b','c', 'd', 'e'])
dataframe2 = DataFrame()

for index,row in dataframe1:
   if row['a'] == 5
       dataframe2.append(row)

print dataframe2['b']

This doesnt work, because he wont accept strings inside the bracket for dataframe2. Yes this could be done easier, but for the sake of argument lets say it couldnt(more complex logic than one if).

In my real code there are like ten different ifs and elses determining what to do with that specific row (and do other stuff from within the loop). Im not talking about filtering but just adding the row to a new dataframe in a way that it preservers the index so i can reference with the name of the column

In pandas , it is pretty straightforward to filter and pass the results, if needed, to a new dataframe, just as @smci suggests for r .

import numpy as np
import pandas as pd

dataframe1 = pd.DataFrame(np.random.randn(10, 5), columns=['a','b','c', 'd', 'e'])
dataframe1.head()

          a         b         c         d         e
0 -2.824391 -0.143400 -0.936304  0.056744 -1.958325
1 -1.116849  0.010941 -1.146384  0.034521 -3.239772
2 -2.026315  0.600607  0.071682 -0.925031  0.575723
3  0.088351  0.912125  0.770396  1.148878  0.230025
4 -0.954288 -0.526195  0.811891  0.558740 -2.025363

Then, to filter, you can do like so:

dataframe2=dataframe1.ix[dataframe1.a>.5]
dataframe2.head()

         a         b         c         d         e
0  0.708511  0.282347  0.831361  0.331655 -2.328759
1  1.646602 -0.090472 -0.074580 -0.272876 -0.647686
8  2.728552 -0.481700  0.338771  0.848957 -0.118124

EDIT

OP didn't want to use a filter, so here is an example iterating through rows instead:

np.random.seed(123)
dataframe1 = pd.DataFrame(np.random.randn(10, 5), columns=['a','b','c', 'd', 'e'])
## I declare the second df with the same structure
dataframe2 = pd.DataFrame(columns=['a','b','c', 'd', 'e'])

For the loop I use iterrows , and instead of append ing to an empty dataframe, I use the index from the iterator to place at the same index position in the empty frame. Notice that I said > .5 instead of = 5 or else the resulting dataframe would be empty for sure.

for index, row in dataframe1.iterrows():
    if row['a'] > .5:

        dataframe2.loc[index] =  row

dataframe2

          a         b         c         d         e
1  1.651437 -2.426679 -0.428913  1.265936 -0.866740
4  0.737369  1.490732 -0.935834  1.175829 -1.253881

UPDATE:

Don't. Solution is:

dataframe1[dataframe1.a > .5]
# or, if you only want the 'b' column
dataframe1[dataframe1.a > .5] ['b']

You only want to filter for rows where a==5 (and then select the b column?) You have still shown zero reason whatsoever why you need to append to the dataframe1. In fact you don't need to append anything, you just directly generate your filtered version.

ORIGINAL VERSION:

Don't.

If all you want to do is compute aggregations or summaries and they don't really belong in the parent dataframe, do a filter. Assign the result to a separate dataframe.

If you really insist on using iterate+append, instead of filter, even knowing all the caveats, then create an empty summary dataframe, then append to that as you iterate. Only after you're finished iterating, append it (and only if you really need to), back to the parent dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM