简体   繁体   English

将np.array数据添加到熊猫数据框中的列后,是否可以对其排序?

[英]Can I sort my np.array data once it has been added to a column in a pandas dataframe?

I have a column of np.array data that I add to the last column of my pandas dataframe. 我有一列np.array数据,我将其添加到熊猫数据框的最后一列。 However, I need the data sorted in ascending order inside that np.array. 但是,我需要在该np.array中以升序排序的数据。 (It is not sorted in ascending order in the dataframe from which it is taken.) (在获取它的数据框中,它没有按升序排序。)

dataframe structure: 数据框结构:

    GFP_spot_1_position, GFP_spot_2_position, GFP_spot_3_position, ...  
    0 _        0.2,                 0.4,              0.6,              NaN          
    1 _        0.8,                 0.2,              NaN,              NaN         
    2 _        0.7,                 0.5,              0.6,              0.9      
    3 _        0.5,                 NaN,              0.1,              NaN      

What I want it to look like: 我希望它看起来像什么:

    gfp_spots_all                         
    0 _ [0.2, 0.4, 0.6, nan]             
    1 _ [0.2, 0.8, nan, nan]               
    2 _ [0.5, 0.6, 0.7, 0.9]            
    3 _ [0.1, 0.5, nan, nan] 

What it actually looks like with the code below: 下面的代码实际上是什么样的:

    gfp_spots_all                                      
    0 _        [0.2,                 0.4,              0.6,              NaN]          
    1 _        [0.8,                 0.2,              NaN,              NaN]       
    2 _        [0.7,                 0.5,              0.6,              0.9]      
    3 _        [0.5,                 NaN,              0.1,              NaN]      

Here's the code I have so far: 这是我到目前为止的代码:

df = pd.read_csv('dfall.csv')

dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']

df['gfp_spots_all'] = dfgfp.apply(lambda r: list(r), 
    axis=1).apply(np.array)

df.head()

I cant seem or sort the values in the array. 我看不到或排序数组中的值。 Please help! 请帮忙! Also, I'm new to python as well so I'm learning as I go. 另外,我也是python的新手,所以我正在学习。 Please feel free to correct my sloppy code. 请随时更正我的草率代码。

It seems you can, see the code below 看来可以,请参见下面的代码

arr = np.array([[3,5,1,7,4,2],[12,18,11,np.nan,np.nan,18]])
df = pd.DataFrame(arr)
print(df)

Output 输出量

      0     1     2    3    4     5
0   3.0   5.0   1.0  7.0  4.0   2.0
1  12.0  18.0  11.0  NaN  NaN  18.0
np.ndarray.sort(df.values)
print(df)

Output 输出量

     0     1     2     3    4    5
0   1.0   2.0   3.0   4.0  5.0  7.0
1  11.0  12.0  18.0  18.0  NaN  NaN

But it will mis-match values and columns, did you intend that? 但这会导致值和列不匹配,您打算这样做吗?

As per @G. 按照@G。 Anderson's comment, adding a sorted() to your lambda expression will solve the issue. 安德森(Anderson)的评论,在您的lambda表达式中添加了sorted()将解决此问题。 Actually quite a bit of the code in your example is redundant: 实际上,示例中的很多代码是多余的:

dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']

df['gfp_spots_all'] = dfgfp.apply(lambda r: sorted(r), axis=1) 

I believe that will do what you require. 我相信这会满足您的要求。

There must be a more pythonique way to do it, but here is a way solve this: 必须有一种更多的pythonique方式来做到这一点,但这是解决此问题的一种方式:

In [1]:
import pandas as pd

# Create the Dataframe
data = {'col1': [[9, 3], [2, 4], [7, 6], [3, 3], [8, 0], [0,4]], 'col2': [[1,3], [9,4], [4,2], [5,1], [3,7], [9,8]]}
df = pd.DataFrame(data=data)

## Loop on each row
for i in range(len(df)):
    ## Loop on each column
    for k in range(len(df.columns)):
        df.iloc[i][k].sort()

df

Out [1]:
    col1    col2
0   [3, 9]  [1, 3]
1   [2, 4]  [4, 9]
2   [6, 7]  [2, 4]
3   [3, 3]  [1, 5]
4   [0, 8]  [3, 7]
5   [0, 4]  [8, 9]

# Here's what worked   
df = pd.read_csv('dfall.csv')
dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']
df['gfp_spots_all'] = dfgfp.apply(lambda r: list(r), axis=1).apply(np.array)
dfjust = pd.DataFrame([df.gfp_spots_all]).transpose()


## Loop on each row
for i in range(len(dfjust)):
     for k in range(len(dfjust.columns)):
         dfjust.iloc[i][k].sort()

dfjust.head()

[out:]
    gfp_spots_all .      
0   [3.4165, 19.63, nan, nan]                       
1   [6.7447, 18.044, nan, nan]         
2   [5.088, 10.261, nan, nan]         
3   [5.4081, 16.097, nan, nan]     
4   [4.2675, nan, nan, nan]      


5 rows × 1 columns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM