简体   繁体   中英

Comparing to the next/previous values in loop python dataframe

How to compare values to next or previous items in loop? I need to summarize consecutive repetitinos of occurences in columns.

After that I need to create "frequency table" so the dfoutput schould looks like on the bottom picture.

This code doesn't work because I can't compare to another item.

Maybe there is another, simple way to do this without looping?

sumrep=0

df = pd.DataFrame(data = {'1' : [0,0,1,0,1,1,0,1,1,0,1,1,1,1,0],'2' : [0,0,1,1,1,1,0,0,1,0,1,1,0,1,0]})
df.index= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]   # It will be easier to assign repetitions in output df - index will be equal to number of repetitions

dfoutput = pd.DataFrame(0,index=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],columns=['1','2'])

#example for column 1
for val1 in df.columns[1]:                           
    if val1 == 1 and val1 ==0:   #can't find the way to check NEXT val1 (one row below) in column 1 :/
        if sumrep==0:            
            dfoutput.loc[1,1]=dfoutput.loc[1,1]+1   #count only SINGLE occurences of values and assign it to proper row number 1 in dfoutput
        if sumrep>0:
            dfoutput.loc[sumrep,1]=dfoutput.loc[sumrep,1]+1   #count repeated occurences greater then 1 and assign them to proper row in dfoutput
            sumrep=0
    elif val1 == 1 and df[val1+1]==1 :
        sumrep=sumrep+1

Desired output table for column 1 - dfoutput:

在此处输入图片说明

I don't undestand why there is no any simple method to move around dataframe like offset function in VBA in Excel:/

You can use the function defined here to perform fast run-length-encoding:

 import numpy as np def rlencode(x, dropna=False): """ Run length encoding. Based on http://stackoverflow.com/a/32681075, which is based on the rle function from R. Parameters ---------- x : 1D array_like Input array to encode dropna: bool, optional Drop all runs of NaNs. Returns ------- start positions, run lengths, run values """ where = np.flatnonzero x = np.asarray(x) n = len(x) if n == 0: return (np.array([], dtype=int), np.array([], dtype=int), np.array([], dtype=x.dtype)) starts = np.r_[0, where(~np.isclose(x[1:], x[:-1], equal_nan=True)) + 1] lengths = np.diff(np.r_[starts, n]) values = x[starts] if dropna: mask = ~np.isnan(values) starts, lengths, values = starts[mask], lengths[mask], values[mask] return starts, lengths, values 

With this function your task becomes a lot easier:

import pandas as pd
from collections import Counter
from functools import partial

def get_frequency_of_runs(col, value=1, index=None):
     _, lengths, values = rlencode(col)
     return pd.Series(Counter(lengths[np.where(values == value)]), index=index)

df = pd.DataFrame(data={'1': [0,0,1,0,1,1,0,1,1,0,1,1,1,1,0],
                        '2': [0,0,1,1,1,1,0,0,1,0,1,1,0,1,0]})
df.apply(partial(get_frequency_of_runs, index=df.index)).fillna(0)
#       1    2
# 0   0.0  0.0
# 1   1.0  2.0
# 2   2.0  1.0
# 3   0.0  0.0
# 4   1.0  1.0
# 5   0.0  0.0
# 6   0.0  0.0
# 7   0.0  0.0
# 8   0.0  0.0
# 9   0.0  0.0
# 10  0.0  0.0
# 11  0.0  0.0
# 12  0.0  0.0
# 13  0.0  0.0
# 14  0.0  0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM