[英]Modifying Wide Pandas Data frame based on Condition
I am attempting to edit values for a wide form of time series data based on a condition in python using the pandas library.我正在尝试使用 pandas 库根据 python 中的条件编辑各种时间序列数据的值。 The data is satellite observational values on a given date (see photo below).
数据是给定日期的卫星观测值(见下图)。 The first column is a unique id and all subsequent columns are date values.
第一列是唯一的 id,所有后续列都是日期值。 This means that each row is a time series for that specific id.
这意味着每一行都是该特定 ID 的时间序列。
The idea is this:这个想法是这样的:
if n1 is the current observation and n2 is the next observation and n3 is the observation after that then:如果n1是当前观测值, n2是下一个观测值, n3是之后的观测值,则:
if ((n2 - n1) > 0.3) and (n3 >= (0.9 * n1)):
n2 = (n1 + n3) / 2
Just to be clear, n1, n2, n3 are the first three values of this data frame, not attributes.需要明确的是,n1、n2、n3 是该数据帧的前三个值,而不是属性。 For the attached example n1 would be 0.25916876 and n2 would be 0.25916876 and n3 would be 0.23824187.
对于附加的示例,n1 将是 0.25916876,n2 将是 0.25916876,n3 将是 0.23824187。
How can I modify my Data frame with this rule?如何使用此规则修改我的数据框? Could this be done with list comprehension?
这可以通过列表理解来完成吗?
If your dataframe is named df
, then you can try:如果您的 dataframe 名为
df
,那么您可以尝试:
mask = (df.n1 - df.n2 > 0.3) & (df.n3 >= (0.9*df.n1))
df.n2.where(~mask, (df.n1 + df.n3) / 2)
I assume you want to do this process for each column of the dataframe.我假设您想对 dataframe 的每一列执行此过程。 This is working with a fake dataframe I created to replicate the process:
这与我创建的用于复制该过程的假 dataframe 一起使用:
# Iterate over each column
for c in list(df):
df[c] = np.where((df[c]-df[c].shift(1, fill_value=0)>0.3) &
(df[c].shift(-1, fill_value=0) > 0.9*df[c].shift(1, fill_value=0)),
np.mean(df[c].shift(-1, fill_value=0),df[c].shift(1, fill_value=0)),
df[c])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.