简体   繁体   English

在熊猫,多个列名中对数据框进行排序时出现问题

[英]Problems sorting dataframe in pandas, multiple column names

I'm a beginner using python and pandas. 我是使用python和pandas的初学者。 I'm trying to save some results that I produced through a function. 我正在尝试保存通过函数产生的一些结果。 I'm having several problems at the moment to generate the dataframe with desired results. 我目前在生成具有所需结果的数据帧时遇到一些问题。 Here is an example of an iterable function that I wrote (To simplify the description I'm using the example of the area of several triangles. My real function is more complicated and has several intermediate steps): 这是我编写的一个可迭代函数的示例(为简化说明,我使用了几个三角形面积的示例。我的真实函数更复杂,并且有几个中间步骤):

Base = 5
H = [1, 2, 3, 4, 5]

for i in H:

    def Triangle_area():

        H = [i]
        ratio = (Base*H)
        area = np.divide(ratio,2)

        ms = pd.DataFrame(area, columns=[i])
        A = ms[i].mean()
        A1 = pd.DataFrame({'area':A}, index=[i])

        return A1


    areas = Triangle_area()
    print(areas)

The result is a dataframe or a series of dataframes as follows: 结果是一个数据帧或一系列数据帧,如下所示:

   area
1   0.5
   area
2   1.0
   area 
3   1.5
   area
4   2.0
   area
5   2.5

But what I want should look like this: 但是我想要的应该是这样的:

H   area
1   0.5
2   1.0
3   1.5
4   2.0
5   2.5

I think that there must exist several ways to do this, but I can not find the way to do it. 我认为必须存在几种方法来执行此操作,但是我找不到方法。 Thanks in advance for your comments. 预先感谢您的评论。

You're creating and returning a new dataframe each time in your example, which is not what you want. 您在示例中每次都在创建并返回一个新的数据框,这不是您想要的。 Here is an example of your triangle program that will return one dataframe with the results. 这是您的三角程序的示例,该程序将返回一个带有结果的数据帧。 I hope this helps and you can find how to apply it to your problem, let me know if you need more help 希望对您有所帮助,您可以找到解决问题的方法,如果需要更多帮助,请告诉我

import pandas as pd
import numpy as np


def Triangle_area(height, base):
    '''
    Calculate the area of a right angle triangle, Area(height, base) = (base*height)/2
    Put all results in pandas dataframe before returning
    '''
    H = np.array(height)    # Make numpy array of heights, easier for our computations
    ratio = H * base        
    area = ratio/2
    A1 = pd.DataFrame({'H': height, 'area':area}) # Turn results into pandas dataframe

    return A1 # return column


Base = 5
H = [1, 2, 3, 4, 5]

areas = Triangle_area(H, Base)
print(areas)

Always try to use vectorized approach first: 始终首先尝试使用向量化方法:

In [115]: def Triangle_area(base, h):
     ...:     return base * h / 2.
     ...:

In [116]: df = pd.DataFrame({'base':[1,2,3,4,5], 'h':[5]*5})

In [117]: df
Out[117]:
   base  h
0     1  5
1     2  5
2     3  5
3     4  5
4     5  5

In [118]: df['area'] = Triangle_area(df['base'], df['h'])

In [119]: df
Out[119]:
   base  h  area
0     1  5   2.5
1     2  5   5.0
2     3  5   7.5
3     4  5  10.0
4     5  5  12.5
def Triangle_area():

    '''
    @param Base, scalar length of triangle base
    @param H, array of various triangle heights
    @returns A1, dataframe with columns 'H' and 'area'
        corresponding to height and area of triangle with 
        that height
    '''

    ratio = Base*np.array(H)
    area = np.divide(ratio,2)

    A1 = pd.DataFrame({'H':np.array(H),'area':area})

    return A1

areas = Triangle_area()
print(areas)

Here I tried to retain the same function and global variable names as you. 在这里,我试图保留与您相同的函数和全局变量名称。 A few tricks: 一些技巧:

  • Convert the list H into a NumPy array (NumPy works very well with pandas and I would recommend looking into it if you are not familiar with it already). 将列表H转换为NumPy数组(NumPy与熊猫配合得很好,如果您还不熟悉它,我建议您对其进行研究)。

  • Instead of using a for loop , operate on the arrays directly. 代替使用for loop ,直接对数组进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM