简体   繁体   中英

Using a for loop to create a new data frame based on an old data frame from multiple conditions

I am new to python and am trying to write a code to create a new dataframe based on conditions from an old dataframe along with the results in the cell above on the new dataframe.

Here is an example of what I am trying to do:

  1. is the raw data

  2. I need to create a new dataframe where if the corresponding position in the raw data is 0 the result is 0, if it is greater than 0 then 1 plus the above row

  3. I need to remove any instances where the consecutive number of intervals doesn't reach at least 3

可视化

The way I think about the code is as such, but being new to python I am struggling.

From Raw data to Dataframe 2:

if (1,1)=0  then (1a, 1a)= 0: # line 1
    else (1a,1a)=1;

if (2,1)=0  then (2a,1a)=0; # line 2
     else (2a,1a)= (1a,1a)+1 = 2;

if (3,1)=0  then (3a,1a)=0; # line 3

From Dataframe 2 to 3:

If any of the last 3 rows is greater than 3 then return that cells value else return 0

I am not sure how to make any of these work, if there is an easier way to do/think about this then what I am doing please let me know. Any help is appreciated!

Based on your question, the output I was able to generate was:

Earlier, the DataFrame looked like so:
       A  B   C
0.05   5  0   0
0.10   7  0   1
0.15   0  0  12
0.20   0  4   3
0.25   1  0   5
0.30  21  5   0
0.35   6  0   9
0.40  15  0   0

Now, the DataFrame looks like so:
      A  B  C
0.05  0  0  0
0.10  0  0  1
0.15  0  0  2
0.20  0  0  3
0.25  1  0  4
0.30  2  0  0
0.35  3  0  0
0.40  4  0  0

The code I used for this is given below, just copy the following code in a new file, say code.py and run it

import re
import pandas as pd

def get_continous_runs(ext_list, threshold):
    mylist = list(ext_list)
    for i in range(len(mylist)):
        if mylist[i] != 0:
            mylist[i] = 1
    samp = "".join(map(str, mylist))
    finder = re.finditer(r"1{%s,}" % threshold, samp)
    ranges = [x.span() for x in finder]
    return ranges

def build_column(ranges, max_len):
    answer = [0]*max_len
    for r in ranges:
        start = r[0]
        run_len = r[1] - start
        for i in range(run_len):
            answer[start+i] = i + 1
    return answer

def main(df):
    print("Earlier, the DataFrame looked like so:")
    print(df)
    ndf = df.copy()
    for col_name, col_data in df.iteritems():
        ranges = get_continous_runs(col_data.values, 4)
        column_len = len(col_data.values)
        new_column = build_column(ranges, column_len)
        ndf[col_name] = new_column
    print("\nNow, the DataFrame looks like so:")
    print(ndf)
    return

if __name__ == '__main__':
    raw_data = [
        (5,0,0), (7,0,1), (0,0,12), (0,4,3),
        (1,0,5), (21,5,0), (6,0,9), (15,0,0),
    ]

    df = pd.DataFrame(
        raw_data,
        columns=list("ABC"),
        index=[0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40]
    )

    main(df)

You can adjust threshold in line #28 to get consecutive number of intervals other than 4 (ie more than 3).

As always, start by reading main() function to understand how everything works. I have tried to use good variable names to aid understanding. My method might seem a little contrived because I am using regex, but I didn't want to overwhelm a very beginner with a custom run-length counter, so...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM