簡體   English   中英

Pandas 使用現有列和字典的新列

[英]Pandas new column using a existing column and a dictionary

我有一個看起來像這樣的數據框:

df = pd.DataFrame({"user_id" : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
                   "score" : [0, 100, 50, 0, 25, 50, 100, 0, 7, 20],
                  "valval" : ["va2.3", "va1.1", "va2.1", "va2.2", "va1.2",
                             "va1.1", "va2.1", "va1.2", "va1.2", "va1.3"]})
   
print(df)


     | user_id | score | valval 
-----+---------+-------+--------
 0   |     a   |    0  | va2.3  
 1   |     b   |  100  | va1.1  
 2   |     c   |   50  | va2.1  
 3   |     d   |    0  | va2.2  
 4   |     e   |   25  | va1.2  
 5   |     f   |   50  | va1.1  
 6   |     g   |  100  | va2.1  
 7   |     h   |    0  | va1.2  
 8   |     i   |    7  | va1.2  
 9   |     j   |   20  | va1.3  

我也有一本看起來像這樣的字典:

dic_t = { "key1" : ["va1.1", "va1.2", "va1.3"], "key2" : ["va2.1", "va2.2", "va2.3"]}

我想要一個新列“keykey”。

此列的值具有對應值的字典鍵。

結果看起來像這樣:

     | user_id | score | valval | keykey 
----------------------------------------
 0   |     a   |    0  | va2.3  | key2
 1   |     b   |  100  | va1.1  | key1
 2   |     c   |   50  | va2.1  | key2
 3   |     d   |    0  | va2.2  | key2
 4   |     e   |   25  | va1.2  | key1
 5   |     f   |   50  | va1.1  | key1
 6   |     g   |  100  | va2.1  | key2
 7   |     h   |    0  | va1.2  | key1
 8   |     i   |    7  | va1.2  | key1
 9   |     j   |   20  | va1.3  | key1

扁平化字典后可以使用series.map

d = {val:k for k,v in dic_t.items() for val in v}
df['keykey'] = df['valval'].map(d)

print(df)

  user_id  score valval keykey
0       a      0  va2.3   key2
1       b    100  va1.1   key1
2       c     50  va2.1   key2
3       d      0  va2.2   key2
4       e     25  va1.2   key1
5       f     50  va1.1   key1
6       g    100  va2.1   key2
7       h      0  va1.2   key1
8       i      7  va1.2   key1
9       j     20  va1.3   key1

更新空白字典並使用 map function

import pandas as pd
df = pd.DataFrame({"user_id" : ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
                   "score" : [0, 100, 50, 0, 25, 50, 100, 0, 7, 20],
                   "valval" : ["va2.3", "va1.1", "va2.1", "va2.2", "va1.2", "va1.1", "va2.1", "va1.2", "va1.2", "va1.3"]})

dic_t = { "key1" : ["va1.1", "va1.2", "va1.3"], "key2" : ["va2.1", "va2.2", "va2.3"]}

d_keykey = {}
for k, v in dic_t.items():
    for val in v:
        d_keykey.update({val: k})
df["keykey"] = df["valval"].map(d_keykey)
print(df)


  user_id  score valval keykey
0       a      0  va2.3   key2
1       b    100  va1.1   key1
2       c     50  va2.1   key2
3       d      0  va2.2   key2
4       e     25  va1.2   key1
5       f     50  va1.1   key1
6       g    100  va2.1   key2
7       h      0  va1.2   key1
8       i      7  va1.2   key1
9       j     20  va1.3   key1

不是最有效的解決方案,但可以完成工作並且易於遵循


def get_keykey(search_val, ref_dict):
    for key in ref_dict:                       # loop over all keys
        if search_val in ref_dict[key]:        # if valval is in list of values associated with key, return that key, else will return None
            return key

# apply to val column of df

df["keykey"] = df["valval"].apply(get_keykey, args = (ref_dict,))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM