简体   繁体   English

如何使用 pandas 根据同一数据帧中另一列的条件获取列值的连续平均值

[英]How to get consecutive averages of the column values based on the condition from another column in the same data frame using pandas

I have large data frame in pandas which has two columns Time and Values.我在 pandas 中有大数据框,它有两列时间和值。 I want to calculate consecutive averages for values in column Values based on the condition which is formed from the column Time.我想根据时间列形成的条件计算列值中值的连续平均值。 I want to calculate average of the first l values in column Values, then next l values from the same column and so on, till the end of the data frame.我想计算列值中第一个 l 值的平均值,然后计算同一列中的下一个 l 值,依此类推,直到数据帧结束。 The value l is the number of values that go into every average and it is determined by the time difference in column Time. l 值是go 进入每个平均值的个数,由Time 列的时间差决定。 Starting data frame looks like this起始数据框如下所示

Time   Values
t1     v1
t2     v2
t3     v3
...    ...
tk     vk

For example, average needs to be taken at every 2 seconds and the number of time values inside that time difference will determine the number of values l for which the average will be calculated.例如,需要每 2 秒取一次平均值,并且该时间差内的时间值的数量将决定计算平均值的值 l 的数量。 a1 would be the first average of l values, a2 next, and so on. a1 将是 l 值的第一个平均值,然后是 a2,依此类推。

Second part of the question is the same calculation of averages, but if the number l is known in advance.问题的第二部分是相同的平均值计算,但如果事先知道数字 l。 I tried this我试过这个

 df['Time'].iloc[0:l].mean()

which works for the first l values.这适用于前 l 值。

In addition, I would need to store the average values in another data frame with columns Time and Averages for plotting using matplotlib.此外,我需要将平均值存储在另一个包含时间和平均值列的数据框中,以便使用 matplotlib 进行绘图。

How can I use pandas to achieve my goal?如何使用 pandas 来实现我的目标?

I have tried the following我试过以下

df = pd.DataFrame({'Time': [1595006371.756430732,1595006372.502789381 ,1595006373.784446912 ,1595006375.476658051], 'Values': [4,5,6,10]},index=list('abcd'))

I get我明白了

   Time                     Values
a  1595006371.756430732       4   
b  1595006372.502789381       5  
c  1595006373.784446912       6   
d  1595006375.476658051      10  

Time is in the format seconds.milliseconds.时间格式为 seconds.milliseconds。

If I expect to have the same number of values in every 2 seconds till the end of the data frame, I can use the following loop to calculate value of l:如果我希望在数据帧结束前每 2 秒有相同数量的值,我可以使用以下循环来计算 l 的值:

s=1
l=0
while df['Time'][s] - df['Time'][0] <= 2:
    s+=1
    l+=1

Could this be done differently, without the loop?如果没有循环,这可以用不同的方式完成吗? How can I do this if number l is not expected to be the same inside each averaging interval?如果数字 l 在每个平均间隔内都不相同,我该怎么做?

For the given l, I want to calculate average values of l elements in another column, for example column Values, and to populate column Averages of data frame df1 with these values.对于给定的 l,我想计算另一列中 l 元素的平均值,例如列值,并用这些值填充数据框 df1 的列平均值。 I tried with the following code我尝试使用以下代码

p=0
df1=pd.DataFrame(columns=['Time','Averages']
for w in range (0, len(df)-1,2):
    df1['Averages'][p]=df['Values'].iloc[w:w+2].mean()
    p=p+1

Is there any other way to calculate these averages?有没有其他方法可以计算这些平均值?

To clarify a bit more.再澄清一点。 I have two columns Time and Values.我有两列时间和值。 I want to determine how many consecutive values from the column Values should be averaged at one point.我想确定值列中有多少连续值应该在一个点上进行平均。 I do that by determining this number l from the column Time by calculating how many rows are inside the time difference of 2 seconds.我通过计算在 2 秒的时间差内有多少行来从时间列中确定这个数字 l 来做到这一点。 When I determined that value, for example 2, then I average first two values from the column Values, and then next 2, and so on till the end of the data frame.当我确定该值时,例如 2,然后我平均来自列 Values 的前两个值,然后是下一个 2,依此类推,直到数据帧结束。 At the end, I store this value in the separate column of another data frame.最后,我将此值存储在另一个数据框的单独列中。

I would appreciate your assistance.我会很感激你的帮助。

You talk about Time and Value and then groups of consecutive rows.您谈论时间和价值,然后是连续行的组。

If you want to group by consecutive rows and get the mean of the Time and Value this does it for you.如果您想按连续行分组并获得TimeValue的平均值,这将为您完成。 You really need to show by example what you are really trying to achieve.你真的需要通过例子来展示你真正想要实现的目标。

d = list(pd.date_range(dt.datetime(2020,7,1), dt.datetime(2020,7,2), freq="15min"))
df = pd.DataFrame({"Time":d, 
      "Value":[round(random.uniform(0, 1),6) for x in d]})

df

n = 5
df.assign(grp=df.index//5).groupby("grp").agg({"Time":lambda s: s.mean(),"Value":"mean"})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 根据同一数据框中另一列的条件替换列值 - Pandas Replace column values based on condition upon another column in the same data frame 从具有基于另一列的条件的 pandas 数据帧中删除重复项 - Removing duplicates from pandas data frame with condition based on another column 如何根据 pandas 中的条件匹配从另一个数据帧更新数据帧列值 - How to update the data frame column values from another data frame based a conditional match in pandas 根据 pandas 数据框中另一列中的条件对一列求和 - Summing a column based on a condition in another column in a pandas data frame 在 Pandas 数据框中快速搜索并根据条件在数据框的另一列中插入值 - Fast search in pandas data frame and inserting values in another column of the data frame based on a condition 如果同一 dataframe 中的另一列符合条件,如何从 pandas 中的列获取值? - How to get values from a Column in pandas if another column in same dataframe matches a condition? 如何根据熊猫的每日数据记录获得相应年份的列值和周数的每周平均值 - How to get weekly averages for column values and week number for the corresponding year based on daily data records with pandas 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas pandas:如何根据列值在一个数据帧中从另一个数据帧中 append 行? - pandas: How can I append rows in one data frame from another based on column values? Pandas 数据框根据条件替换列中的值 - Pandas data frame replace values in column based on condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM