简体   繁体   English

Map 值以分隔列 - pandas

[英]Map values to separate column - pandas

I'm trying to map values from a column to a separate column.我正在尝试将 map 值从一列转换为单独的列。 Using below, the calculate_distances function measures the distance from each point to the nearest point for each Group .使用下面, calculate_distances function 测量从每个点到每个Group最近点的距离。 I also return the index value of each point for identification.我还返回每个点的索引值以供识别。

This all works fine.这一切都很好。 But instead of the index value, I'm hoping to map the corresponding ID value within the function to the output.但不是索引值,我希望 map function 到 output 内的相应ID值。

If I don't map the ID values across, both nearest_object cols will display the index value, instead of the actual ID value.如果我不nearest_object跨过 ID 值,则两个最近的对象列都将显示索引值,而不是实际的ID值。

I'll comment out my attempt to show the output.我将注释掉我展示 output 的尝试。

from sklearn.neighbors import BallTree
import pandas as pd

df = pd.DataFrame({              
    'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],             
    'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],      
    'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],           
    'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
    'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],           
    })

def calculate_distances(df, group_column='Group'):    

    '''
    Calculate distance and id to both red and green groups.
    '''
    # unq groups
    groups = df[group_column].unique()

    all_points = df[['X','Y']].values

    for group in groups:
        group_points = df[df[group_column] == group][['X','Y']]
    
        # calculate distance between points
        tree = BallTree(group_points, leaf_size=15, metric='minkowski')

        distance, index = tree.query(all_points, k=1)
        distances = distance[:,0]
        nearest_id = group_points.index[index[:,0]]
                    
        distance_column_name = "distance_{}".format( group )
        df[ distance_column_name ] = distances
    
        distance_column_nearest_name = "nearest_object_{}".format( group )
        df[distance_column_nearest_name] = nearest_id   

    # map ID values
    #df.iloc[:,-3] = df.iloc[:,-3].map(df.set_index('index')['ID']) 
    #df.iloc[:,-1] = df.iloc[:,-1].map(df.set_index('index')['ID'])       

    return df

df = df.groupby(['Time']).apply(calculate_distances).reset_index()

out:出去:

    Time ID Group    X    Y  distance_Red  nearest_object_Red  distance_Grn  nearest_object_Grn
0      1  A   Red  2.0  3.0      0.000000                   0      1.000000                   4
1      1  B   Red  3.0  1.0      0.000000                   1      1.414214                   3
2      1  C   Red  4.0  0.0      0.000000                   2      2.000000                   3
3      1  X   Grn  2.0  0.0      1.414214                   1      0.000000                   3
4      1  U   Grn  2.0  2.0      1.000000                   0      0.000000                   4
5      1  V   Grn  1.0  1.0      2.000000                   1      0.000000                   5
6      2  A   Red  1.0  2.0      0.000000                   6      1.414214                   9
7      2  B   Red  6.0  0.0      0.000000                   7      1.000000                  10
8      2  C   Red  4.0  1.0      0.000000                   8      1.414214                  10
9      2  X   Grn  2.0  1.0      1.414214                   6      0.000000                   9
10     2  U   Grn  5.0  0.0      1.000000                   7      0.000000                  10
11     2  V   Grn  3.0  0.0      1.414214                   8      0.000000                  11

intended output:预期 output:

    Time ID Group    X    Y  distance_Red nearest_object_Red  distance_Grn nearest_object_Grn
0      1  A   Red  2.0  3.0      0.000000                  A      1.000000                  U
1      1  B   Red  3.0  1.0      0.000000                  B      1.414214                  X
2      1  C   Red  4.0  0.0      0.000000                  C      2.000000                  X
3      1  X   Grn  2.0  0.0      1.414214                  B      0.000000                  X
4      1  U   Grn  2.0  2.0      1.000000                  A      0.000000                  U
5      1  V   Grn  1.0  1.0      2.000000                  B      0.000000                  V
6      2  A   Red  1.0  2.0      0.000000                  A      1.414214                  X
7      2  B   Red  6.0  0.0      0.000000                  B      1.000000                  U
8      2  C   Red  4.0  1.0      0.000000                  C      1.414214                  U
9      2  X   Grn  2.0  1.0      1.414214                  A      0.000000                  X
10     2  U   Grn  5.0  0.0      1.000000                  B      0.000000                  U
11     2  V   Grn  3.0  0.0      1.414214                  C      0.000000                  V

Because the index is named Time while the dataframe already has a column with the same name.因为索引名为Time而 dataframe 已经有一个同名的列。 When you do reset_index , pandas try to make the index a normal column, in this case it fails due to the duplicate name.当您执行reset_index时, pandas 尝试使索引成为普通列,在这种情况下,由于名称重复而失败。 Try:尝试:

df = df.groupby(['Time']).apply(calculate_distances).reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM