简体   繁体   English

循环通过 Pandas dataframe 并根据条件复制到新的 dataframe

[英]Loop through Pandas dataframe and copying to a new dataframe based on a condition

I have a dataframe df with 6000+ rows of data with a datetime index in the form YYYY-MM-DD and with columns ID , water_level and change .我有一个 dataframe df ,其中包含 6000 多行数据,日期时间索引为YYYY-MM-DD格式,列IDwater_levelchange

I want to:我想要:

  1. Loop through each value in the column change and identify turning points循环遍历列change中的每个值并识别转折点
  2. When I find a turning point, copy that entire row of data including the index into a new dataframe eg turningpoints_df当我找到一个转折点时,将包括索引在内的整行数据复制到一个新的 dataframe 例如转折点turningpoints_df
  3. For each new turning point identified in the loop, add that row of data to my new dataframe turningpoints_df so that I end up with something like this:对于循环中确定的每个新转折点,将该行数据添加到我的新 dataframe 转折点turningpoints_df中,以便我最终得到如下内容:
               ID    water_level    change
date           
2000-10-01      2         5.5        -0.01
2000-12-13     40        10.0         0.02
2001-02-10    150         1.1       -0.005
2001-07-29    201        12.4         0.01
...           ...         ...          ...

I was thinking of taking a positional approach so something like (purely illustrative):我正在考虑采用定位方法,例如(纯粹是说明性的):

turningpoints_df = pd.DataFrame(columns = ['ID', 'water_level', 'change'])

for i in range(len(df['change'])):
    if [i-1] < 0 and [i+1] > 0:
        #this is a min point and take this row and copy to turningpoints_df
    elif [i-1] > 0 and [i+1] < 0:
        #this is a max point and take this row and copy to turningpoints_df
    else: 
        pass 

My issue is, is that I'm not sure how to examine each value in my change column against the value before and after and then how to pull out that row of data into a new df when the conditions are met.我的问题是,我不确定如何检查change列中的每个值与之前和之后的值,然后在满足条件时如何将该行数据提取到新的 df 中。

it sounds like you want to make use of the shift method of the DataFrame.听起来您想使用 DataFrame 的shift方法。

#  shift values down by 1:

df[change_down] = df[change].shift(1)


#  shift values up by 1:
df[change_up] = df[change].shift(-1)

you should then be able to compare the values of each row and proceed with whatever you're trying to achieve..然后你应该能够比较每一行的值并继续你想要实现的任何东西..

for row in df.iterrows():
   *check conditions here*

Using some NumPy features that allows you to roll() a series forwards or backwards.使用一些 NumPy 功能,允许您向前或向后roll()系列。 Then have prev and next on same row so can then use a simple function to apply() your logic as everything is on same row.然后将prevnext在同一行上,然后可以使用简单的 function 来apply()您的逻辑,因为所有内容都在同一行上。

from decimal import *
import numpy as np
d = list(pd.date_range(dt.datetime(2000,1,1), dt.datetime(2010,12,31)))
df = pd.DataFrame({"date":d, "ID":[random.randint(1,200) for x in d], 
     "water_level":[round(Decimal(random.uniform(1,13)),2) for x in d], 
      "change":[round(Decimal(random.uniform(-0.05, 0.05)),3) for x in d]})

# have ref to prev and next, just apply logic
def turningpoint(r):
    r["turningpoint"] = (r["prev_change"] < 0 and r["next_change"] > 0) or \
        (r["prev_change"] > 0 and r["next_change"] < 0)
    return r

# use numpy to shift "change" so have prev and next on same row as new columns
# initially default turningpoint boolean
df = df.assign(prev_change=np.roll(df["change"],1), 
          next_change=np.roll(df["change"],-1),
          turningpoint=False).apply(turningpoint, axis=1).drop(["prev_change", "next_change"], axis=1)
# first and last rows cannot be turning points
df.loc[0:0,"turningpoint"] = False
df.loc[df.index[-1], "turningpoint"] = False

# take a copy of all rows that are turningpoints into new df with index
df_turningpoint = df[df["turningpoint"]].copy()
df_turningpoint

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM