简体   繁体   English

Create a new dataframe from an old dataframe where the new dataframe contains row-wise avergae of columns at different locations in the old dataframe

[英]Create a new dataframe from an old dataframe where the new dataframe contains row-wise avergae of columns at different locations in the old dataframe

I have a dataframe called "frame" with 16 columns and 201 rows.我有一个名为“框架”的 dataframe,有 16 列和 201 行。 A screenshot is attached that provides an example dataframe随附的屏幕截图提供了示例 dataframe

enter image description here在此处输入图像描述

Please note the screenshot is just an example, the original dataframe is much larger.请注意截图只是一个例子,原来的 dataframe 要大得多。

I would like to find an efficient way (maybe using for loop or writing a function) to row-wise average different columns in the dataframe.我想找到一种有效的方法(可能使用 for 循环或编写函数)来逐行平均 dataframe 中的不同列。 For instance, to find an average of column "rep" and "rep1" and column "repcycle" and "repcycle1" (similarly for set and setcycle) and save in a new dataframe with only averaged columns.例如,要找到列“rep”和“rep1”以及列“repcycle”和“repcycle1”的平均值(与 set 和 setcycle 类似) ,并保存在仅包含平均列的新 dataframe 中。

I have tried writing a code using iloc我尝试使用 iloc 编写代码

newdf= frame[['sample']].copy()
newdf['rep_avg']=frame.iloc[:, [1,5]].mean(axis=1)  #average row-wise
newdf['repcycle_avg']=frame.iloc[:, [2,6]].mean(axis=1)
newdf['set_avg']=frame.iloc[:, [3,7]].mean(axis=1)  #average row-wise  
newdf['setcycle_avg']=frame.iloc[:, [4,8]].mean(axis=1)
newdf.columns = ['S', 'Re', 'Rec', 'Se', 'Sec']

The above code does the job, but it is tedious to note the locations for every column.上面的代码完成了这项工作,但是记录每一列的位置很乏味。 I would rather like to automate this process since this is repeated for other data files too.我宁愿自动化这个过程,因为其他数据文件也会重复这个过程。

based on your desire "I would rather like to automate this process since this is repeated for other data files too" what I can think of is this below:根据您的愿望“我宁愿自动化这个过程,因为这对其他数据文件也是重复的”我能想到的如下:

in [1]:  frame = pd.read_csv('your path')

result shown below, now as you can see what you want to average are columns 1,5 and 2,6 and so on.结果如下所示,现在您可以看到要平均的是第 1,5 和 2,6 列,依此类推。

out [1]:
    sample  rep   repcycle  set   setcycle  rep1    repcycle1   set1    setcycle1
0   66      40    4         5     3         40      4           5       3
1   78      20    5         6     3         20      5           6       3
2   90      50    6         9     4         50      6           9       4
3   45      70    7         3     2         70      7           7       2

so, we need to create 2 lists所以,我们需要创建 2 个列表

in [2]: import numpy as np
        list_1 = np.arange(1,5,1).tolist()
in [3]: list_1
out[3]: [1,2,3,4]

this for the first half you want to average[rep,repcycle,set,setcycle]这是你想要平均的前半部分[rep,repcycle,set,setcycle]

in [4]: list_2 = [x+4 for x in list_1]
in [5]: list_2
out[5]: [5,6,7,8]

this for the second half you want to average[rep1,repcycle1,set1,setcycle1]这对于你想要平均的下半年[rep1,repcycle1,set1,setcycle1]

in [6]: result = pd.concat([frame.iloc[:, [x,y].mean(axis=1) for x, y in zip(list_1,list_2)],axis=1)
in [7]: result.columns = ['Re', 'Rec', 'Se', 'Sec']

and now you get what you want, and it's automate, all you need to do is change the two lists from above.现在你得到了你想要的,它是自动化的,你需要做的就是改变上面的两个列表。

in [8]: result
out[8]:
    Re    Rec   Se   Sec
0   40.0  4.0   5.0  3.0
1   20.0  5.0   6.0  3.0
2   50.0  6.0   9.0  4.0
3   70.0  7.0   5.0  2.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM