簡體   English   中英

確保熊貓數據框的列具有唯一值

[英]Ensure columns of a pandas dataframe have unique values

鑒於以下情況:

information_dict_from = {
    "v1": {0: "type a", 1: "type b"},
    "v2": {0: "type a", 1: "type b", 3: "type c"},
    "v3": {0: "type a", 1: "type b"},
}

data_from = pd.DataFrame(
    {
        "v1": [0, 0, 1, 1],
        "v2": [0, 1, 1, 3],
        "v3": [0, 1, 1, 0],
    }
)

我想將其轉換為:


information_dict_to = {
    "v1": {0: "type a", 1: "type b"},
    "v2": {2: "type a", 3: "type b", 4: "type c"},
    "v3": {5: "type a", 6: "type b"},
}

data_to = pd.DataFrame(
    {
        "v1": [0, 0, 1, 1],
        "v2": [2, 3, 3, 4],
        "v3": [5, 6, 6, 5],
    }
)

注意 - 轉換數據框列中的值后是互斥的( set(df['v1']) - set(df['v2']) == set(df['v1']) ),以及information_dict_from[<var>]之間的映射information_dict_from[<var>]對應<var>列的鍵被保留。

# copy *_to from *_from
data_to = data_from.copy()
information_dict_to = information_dict_from.copy()

# set the unique increase counter
val = 0
for col in data_from: # for each column (v1, v2, v3)
    u_val_map = {} # create the mapping dict
    for u in data_from[col].unique(): # get all posible value
        data_to.loc[data_from[col]==u, col] = val #set new unique val
        u_val_map[u] = val # record mapping dict
        val+=1 # increase 1 to make new val
    # updating dict for the key==col by using mapping dict
    information_dict_to.update({col:{
        u_val_map[key]:information_dict_from[col][key]
        for key in information_dict_from[col]}})

然后

>>>data_to
    v1  v2  v3
0   0   2   5
1   0   3   6
2   1   3   6
3   1   4   5
>>>information_dict_to
{'v1': {0: 'type a', 1: 'type b'},
 'v2': {2: 'type a', 3: 'type b', 4: 'type c'},
 'v3': {5: 'type a', 6: 'type b'}}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM