I am new to python and am trying to write a code to create a new dataframe based on conditions from an old dataframe along with the results in the cell above on the new dataframe.
Here is an example of what I am trying to do:
is the raw data
I need to create a new dataframe where if the corresponding position in the raw data is 0 the result is 0, if it is greater than 0 then 1 plus the above row
I need to remove any instances where the consecutive number of intervals doesn't reach at least 3
The way I think about the code is as such, but being new to python I am struggling.
From Raw data to Dataframe 2:
if (1,1)=0 then (1a, 1a)= 0: # line 1
else (1a,1a)=1;
if (2,1)=0 then (2a,1a)=0; # line 2
else (2a,1a)= (1a,1a)+1 = 2;
if (3,1)=0 then (3a,1a)=0; # line 3
From Dataframe 2 to 3:
If any of the last 3 rows is greater than 3 then return that cells value else return 0
I am not sure how to make any of these work, if there is an easier way to do/think about this then what I am doing please let me know. Any help is appreciated!
Based on your question, the output I was able to generate was:
Earlier, the DataFrame looked like so:
A B C
0.05 5 0 0
0.10 7 0 1
0.15 0 0 12
0.20 0 4 3
0.25 1 0 5
0.30 21 5 0
0.35 6 0 9
0.40 15 0 0
Now, the DataFrame looks like so:
A B C
0.05 0 0 0
0.10 0 0 1
0.15 0 0 2
0.20 0 0 3
0.25 1 0 4
0.30 2 0 0
0.35 3 0 0
0.40 4 0 0
The code I used for this is given below, just copy the following code in a new file, say code.py
and run it
import re
import pandas as pd
def get_continous_runs(ext_list, threshold):
mylist = list(ext_list)
for i in range(len(mylist)):
if mylist[i] != 0:
mylist[i] = 1
samp = "".join(map(str, mylist))
finder = re.finditer(r"1{%s,}" % threshold, samp)
ranges = [x.span() for x in finder]
return ranges
def build_column(ranges, max_len):
answer = [0]*max_len
for r in ranges:
start = r[0]
run_len = r[1] - start
for i in range(run_len):
answer[start+i] = i + 1
return answer
def main(df):
print("Earlier, the DataFrame looked like so:")
print(df)
ndf = df.copy()
for col_name, col_data in df.iteritems():
ranges = get_continous_runs(col_data.values, 4)
column_len = len(col_data.values)
new_column = build_column(ranges, column_len)
ndf[col_name] = new_column
print("\nNow, the DataFrame looks like so:")
print(ndf)
return
if __name__ == '__main__':
raw_data = [
(5,0,0), (7,0,1), (0,0,12), (0,4,3),
(1,0,5), (21,5,0), (6,0,9), (15,0,0),
]
df = pd.DataFrame(
raw_data,
columns=list("ABC"),
index=[0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40]
)
main(df)
You can adjust threshold in line #28 to get consecutive number of intervals other than 4 (ie more than 3).
As always, start by reading main() function to understand how everything works. I have tried to use good variable names to aid understanding. My method might seem a little contrived because I am using regex, but I didn't want to overwhelm a very beginner with a custom run-length counter, so...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.