[英]Can I sort my np.array data once it has been added to a column in a pandas dataframe?
I have a column of np.array data that I add to the last column of my pandas dataframe. 我有一列np.array数据,我将其添加到熊猫数据框的最后一列。 However, I need the data sorted in ascending order inside that np.array. 但是,我需要在该np.array中以升序排序的数据。 (It is not sorted in ascending order in the dataframe from which it is taken.) (在获取它的数据框中,它没有按升序排序。)
dataframe structure: 数据框结构:
GFP_spot_1_position, GFP_spot_2_position, GFP_spot_3_position, ...
0 _ 0.2, 0.4, 0.6, NaN
1 _ 0.8, 0.2, NaN, NaN
2 _ 0.7, 0.5, 0.6, 0.9
3 _ 0.5, NaN, 0.1, NaN
What I want it to look like: 我希望它看起来像什么:
gfp_spots_all
0 _ [0.2, 0.4, 0.6, nan]
1 _ [0.2, 0.8, nan, nan]
2 _ [0.5, 0.6, 0.7, 0.9]
3 _ [0.1, 0.5, nan, nan]
What it actually looks like with the code below: 下面的代码实际上是什么样的:
gfp_spots_all
0 _ [0.2, 0.4, 0.6, NaN]
1 _ [0.8, 0.2, NaN, NaN]
2 _ [0.7, 0.5, 0.6, 0.9]
3 _ [0.5, NaN, 0.1, NaN]
Here's the code I have so far: 这是我到目前为止的代码:
df = pd.read_csv('dfall.csv')
dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']
df['gfp_spots_all'] = dfgfp.apply(lambda r: list(r),
axis=1).apply(np.array)
df.head()
I cant seem or sort the values in the array. 我看不到或排序数组中的值。 Please help! 请帮忙! Also, I'm new to python as well so I'm learning as I go. 另外,我也是python的新手,所以我正在学习。 Please feel free to correct my sloppy code. 请随时更正我的草率代码。
It seems you can, see the code below 看来可以,请参见下面的代码
arr = np.array([[3,5,1,7,4,2],[12,18,11,np.nan,np.nan,18]])
df = pd.DataFrame(arr)
print(df)
Output 输出量
0 1 2 3 4 5
0 3.0 5.0 1.0 7.0 4.0 2.0
1 12.0 18.0 11.0 NaN NaN 18.0
np.ndarray.sort(df.values)
print(df)
Output 输出量
0 1 2 3 4 5
0 1.0 2.0 3.0 4.0 5.0 7.0
1 11.0 12.0 18.0 18.0 NaN NaN
But it will mis-match values and columns, did you intend that? 但这会导致值和列不匹配,您打算这样做吗?
As per @G. 按照@G。 Anderson's comment, adding a sorted()
to your lambda expression will solve the issue. 安德森(Anderson)的评论,在您的lambda表达式中添加了sorted()
将解决此问题。 Actually quite a bit of the code in your example is redundant: 实际上,示例中的很多代码是多余的:
dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']
df['gfp_spots_all'] = dfgfp.apply(lambda r: sorted(r), axis=1)
I believe that will do what you require. 我相信这会满足您的要求。
There must be a more pythonique
way to do it, but here is a way solve this: 必须有一种更多的pythonique
方式来做到这一点,但这是解决此问题的一种方式:
In [1]:
import pandas as pd
# Create the Dataframe
data = {'col1': [[9, 3], [2, 4], [7, 6], [3, 3], [8, 0], [0,4]], 'col2': [[1,3], [9,4], [4,2], [5,1], [3,7], [9,8]]}
df = pd.DataFrame(data=data)
## Loop on each row
for i in range(len(df)):
## Loop on each column
for k in range(len(df.columns)):
df.iloc[i][k].sort()
df
Out [1]:
col1 col2
0 [3, 9] [1, 3]
1 [2, 4] [4, 9]
2 [6, 7] [2, 4]
3 [3, 3] [1, 5]
4 [0, 8] [3, 7]
5 [0, 4] [8, 9]
# Here's what worked
df = pd.read_csv('dfall.csv')
dfgfp = df.loc[:, 'GFP_spot_1_position':'GFP_spot_4_position']
df['gfp_spots_all'] = dfgfp.apply(lambda r: list(r), axis=1).apply(np.array)
dfjust = pd.DataFrame([df.gfp_spots_all]).transpose()
## Loop on each row
for i in range(len(dfjust)):
for k in range(len(dfjust.columns)):
dfjust.iloc[i][k].sort()
dfjust.head()
[out:]
gfp_spots_all .
0 [3.4165, 19.63, nan, nan]
1 [6.7447, 18.044, nan, nan]
2 [5.088, 10.261, nan, nan]
3 [5.4081, 16.097, nan, nan]
4 [4.2675, nan, nan, nan]
5 rows × 1 columns
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.