如何基於第三列中的值將numpy數組中的數據從列/行移動到另一個

Question

我試圖對數據進行排序以從中獲取：

對此：

基本上，我正在嘗試將5行數據（每個具有1個ID和2個值）壓縮為1行數據（具有1個ID和10個值）。 我的數據是大約。 600萬行。 需要注意的一件事：並非每個組都有5（X，Y）個坐標值。 有些只有4。

我不知道如何通過單獨建立索引來做到這一點。 所以我寫了一個for循環，效果不是很好。 它將對第一個10,000 ok進行排序（但以錯誤結尾），但是它要花很多時間。

coords = pd.read_csv('IDQQCoords.csv') 

coords = coords.as_matrix(columns=None) 

mpty = np.zeros((len(coords),8),dtype=float) 
#creates an empty array the same length as coords

coords = np.append(coords,mpty,axis=1) 
# adds the 8 empty columns from the previous command
#This is to make space to add the values from subsequent rows 



cnt = 0
lth = coords.shape[0]
for counter in range(1,lth):

    if coords[cnt+1,0] == coords[cnt,0]:
        coords[cnt,3:5] = coords[cnt+1,1:3]        
        coords = np.delete(coords,cnt+1,axis=0)

    if coords[cnt+1,0] == coords[cnt,0]:
        coords[cnt,5:7] = coords[cnt+1,1:3]       
        coords = np.delete(coords,cnt+1,axis=0)

    if coords[cnt+1,0] == coords[cnt,0]:
        coords[cnt,7:9] = coords[cnt+1,1:3]
        coords = np.delete(coords,cnt+1,axis=0)

    if coords[cnt+1,0] == coords[cnt,0]:
        coords[cnt,9:11] = coords[cnt+1,1:3]        
        coords = np.delete(coords,cnt+1,axis=0)

    cnt = cnt+1

有人可以通過索引或更好的循環來幫助我嗎？

萬分感謝

Answer 1

假如說

coords = pd.read_csv('IDQQCoords.csv')

暗示您正在使用Pandas，則產生所需結果的最簡單方法是使用DataFrame.pivot ：

import pandas as pd
import numpy as np
np.random.seed(2016)

df = pd.DataFrame({'shapeid': [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2],
               'x': np.random.random(14),
               'y': np.random.random(14)}) 
#     shapeid         x         y
# 0         0  0.896705  0.603638
# 1         0  0.730239  0.588791
# 2         0  0.783276  0.069347
# 3         0  0.741652  0.942829
# 4         0  0.462090  0.372599
# 5         1  0.642565  0.451989
# 6         1  0.224864  0.450841
# 7         1  0.708547  0.033112
# 8         1  0.747126  0.169423
# 9         2  0.625107  0.180155
# 10        2  0.579956  0.352746
# 11        2  0.242640  0.342806
# 12        2  0.131956  0.277638
# 13        2  0.143948  0.375779

df['col'] = df.groupby('shapeid').cumcount()
df = df.pivot(index='shapeid', columns='col')
df = df.sort_index(axis=1, level=1)
df.columns = ['{}{}'.format(col, num) for col,num in df.columns]
print(df)

產量

               x0        y0        x1        y1        x2        y2        x3  \
shapeid                                                                         
0        0.896705  0.603638  0.730239  0.588791  0.783276  0.069347  0.741652   
1        0.642565  0.451989  0.224864  0.450841  0.708547  0.033112  0.747126   
2        0.625107  0.180155  0.579956  0.352746  0.242640  0.342806  0.131956   

               y3        x4        y4  
shapeid                                
0        0.942829  0.462090  0.372599  
1        0.169423       NaN       NaN  
2        0.277638  0.143948  0.375779

如何基於第三列中的值將numpy數組中的數據從列/行移動到另一個

問題描述

1 個解決方案

解決方案1
0 已采納 2016-03-03 13:42:15

如何基於第三列中的值將numpy數組中的數據從列/行移動到另一個

問題描述

1 個解決方案

解決方案1 0 已采納 2016-03-03 13:42:15

解決方案1
0 已采納 2016-03-03 13:42:15