[英]How to get consecutive averages of the column values based on the condition from another column in the same data frame using pandas
I have large data frame in pandas which has two columns Time and Values.我在 pandas 中有大数据框,它有两列时间和值。 I want to calculate consecutive averages for values in column Values based on the condition which is formed from the column Time.
我想根据时间列形成的条件计算列值中值的连续平均值。 I want to calculate average of the first l values in column Values, then next l values from the same column and so on, till the end of the data frame.
我想计算列值中第一个 l 值的平均值,然后计算同一列中的下一个 l 值,依此类推,直到数据帧结束。 The value l is the number of values that go into every average and it is determined by the time difference in column Time.
l 值是go 进入每个平均值的个数,由Time 列的时间差决定。 Starting data frame looks like this
起始数据框如下所示
Time Values
t1 v1
t2 v2
t3 v3
... ...
tk vk
For example, average needs to be taken at every 2 seconds and the number of time values inside that time difference will determine the number of values l for which the average will be calculated.例如,需要每 2 秒取一次平均值,并且该时间差内的时间值的数量将决定计算平均值的值 l 的数量。 a1 would be the first average of l values, a2 next, and so on.
a1 将是 l 值的第一个平均值,然后是 a2,依此类推。
Second part of the question is the same calculation of averages, but if the number l is known in advance.问题的第二部分是相同的平均值计算,但如果事先知道数字 l。 I tried this
我试过这个
df['Time'].iloc[0:l].mean()
which works for the first l values.这适用于前 l 值。
In addition, I would need to store the average values in another data frame with columns Time and Averages for plotting using matplotlib.此外,我需要将平均值存储在另一个包含时间和平均值列的数据框中,以便使用 matplotlib 进行绘图。
How can I use pandas to achieve my goal?如何使用 pandas 来实现我的目标?
I have tried the following我试过以下
df = pd.DataFrame({'Time': [1595006371.756430732,1595006372.502789381 ,1595006373.784446912 ,1595006375.476658051], 'Values': [4,5,6,10]},index=list('abcd'))
I get我明白了
Time Values
a 1595006371.756430732 4
b 1595006372.502789381 5
c 1595006373.784446912 6
d 1595006375.476658051 10
Time is in the format seconds.milliseconds.时间格式为 seconds.milliseconds。
If I expect to have the same number of values in every 2 seconds till the end of the data frame, I can use the following loop to calculate value of l:如果我希望在数据帧结束前每 2 秒有相同数量的值,我可以使用以下循环来计算 l 的值:
s=1
l=0
while df['Time'][s] - df['Time'][0] <= 2:
s+=1
l+=1
Could this be done differently, without the loop?如果没有循环,这可以用不同的方式完成吗? How can I do this if number l is not expected to be the same inside each averaging interval?
如果数字 l 在每个平均间隔内都不相同,我该怎么做?
For the given l, I want to calculate average values of l elements in another column, for example column Values, and to populate column Averages of data frame df1 with these values.对于给定的 l,我想计算另一列中 l 元素的平均值,例如列值,并用这些值填充数据框 df1 的列平均值。 I tried with the following code
我尝试使用以下代码
p=0
df1=pd.DataFrame(columns=['Time','Averages']
for w in range (0, len(df)-1,2):
df1['Averages'][p]=df['Values'].iloc[w:w+2].mean()
p=p+1
Is there any other way to calculate these averages?有没有其他方法可以计算这些平均值?
To clarify a bit more.再澄清一点。 I have two columns Time and Values.
我有两列时间和值。 I want to determine how many consecutive values from the column Values should be averaged at one point.
我想确定值列中有多少连续值应该在一个点上进行平均。 I do that by determining this number l from the column Time by calculating how many rows are inside the time difference of 2 seconds.
我通过计算在 2 秒的时间差内有多少行来从时间列中确定这个数字 l 来做到这一点。 When I determined that value, for example 2, then I average first two values from the column Values, and then next 2, and so on till the end of the data frame.
当我确定该值时,例如 2,然后我平均来自列 Values 的前两个值,然后是下一个 2,依此类推,直到数据帧结束。 At the end, I store this value in the separate column of another data frame.
最后,我将此值存储在另一个数据框的单独列中。
I would appreciate your assistance.我会很感激你的帮助。
You talk about Time and Value and then groups of consecutive rows.您谈论时间和价值,然后是连续行的组。
If you want to group by consecutive rows and get the mean of the Time and Value this does it for you.如果您想按连续行分组并获得Time和Value的平均值,这将为您完成。 You really need to show by example what you are really trying to achieve.
你真的需要通过例子来展示你真正想要实现的目标。
d = list(pd.date_range(dt.datetime(2020,7,1), dt.datetime(2020,7,2), freq="15min"))
df = pd.DataFrame({"Time":d,
"Value":[round(random.uniform(0, 1),6) for x in d]})
df
n = 5
df.assign(grp=df.index//5).groupby("grp").agg({"Time":lambda s: s.mean(),"Value":"mean"})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.