試圖用索引替換 pd.DataFrame 列，枚舉中的項目。我得到不同的長度

Question

我有點陷入與字典、熊貓系列和 DF 相關的循環情境中，我想聽聽一些關於這方面的建議。

我處理的問題如下：

我有一個 pd.DataFrame ，其中有幾列名為：

df["country"] 表示此人是來自西班牙還是來自英國
df["lvl"] 說明此人在公司內的員工級別

請注意，出於某種原因，我不知道此級別與我需要使用的級別不匹配，因此我想到了使用字典/元組來替換它們。

所以這將是字典，我使用它們來避免鍵出現任何錯誤，然后將它們轉換為元組，因為某些值具有相同的鍵。

lvls_uk - 與英國的 lvls
lvls_sp - 西班牙語級別

所以我的目標是根據員工定居在哪里（英國或SP）的條件替換他/她的水平，用字典中的水平替換。 問題是，當我運行循環時，它會比我的列的 len 513 拋出更多的事件（包括 0，這使它成為 514）

我的代碼如下：

# I need to go from 25 different levels to 7, the 0 is to avoid errors since a few doesnt have any type of level.

lvls_sp = {      "25":"1", "24" : "1", "23":"1",
                 "22":"2" , "21":"2",
                 "20":"3", "19":"3", "18":"3", "17":"3", 
                 "16":"4", "15":"4",  
                 "14":"5", "13":"5" , "12":"5",
                 "11":"6", "10":"6", "9":"6", 
                 "8":"7", "7":"7", "6":"7", "0":"0"
                }


lvls_uk = {   "25": "s8", 
              "24" : "s7h","23":"s7h",
              "22":"s7" , "21":"s7",
              "20":"s6","19":"s6", "18":"s6", 
              "17":"s5", 
              "15":"s4", "16": "s4", 
              "14":"s3", "13":"s3" , 
              "12":"s2h",
              "11":"s2", "10":"s2", "9":"s2", 
              "8": "s2l" , "7": "s2l", 
              "6": "s1", "0":"0"
             }

# I create the tupples:

tupla_sp = list()
tupla_uk = list()

for k,v in lvls_sp.items():
    tupla_sp.append((v, k))
    
for k,v in lvls_uk.items():
    tupla_uk.append((v, k))





# I tried different options but i finlly decided to go with by index and item

# Create a list to append them and convert it into a pd.series after
r_levels = []


for index, item in enumerate(df["country"]):
    if item == "SP":
        for k, v in tupla_sp:
            if df["lvl"][index] in v:
                r_levels .append(k)
    if item == "UK":
        for k, v in tupla_uk:
            if df["lvl"][index] in v:
                r_levels .append(k)

但是每當我運行它時，我都會得到一個長度為 623 的列表。

你發現代碼中有任何錯誤嗎？ 我願意嘗試不同的方法。 這是我第一次這樣做，我被困了幾天。

太感謝了！

Answer 1

也許你可以做類似的事情

df_spain = df[df["country"] == "SP"]
df_uk = df[df["country"] == "UK"]

既然你已經設置了字典

df_spain = df_spain.replace(to_replace=lvls_sp)
df_spain = df_uk.replace(to_replace=lvls_uk)

df = pd.concat([df_spain, df_uk])

Answer 2

這應該可以解決問題，它的作用是一次從一個國家/地區帶走所有員工並替換他們的級別：

df.loc[df['country']=='UK', 'lvl'] = df.loc[df['country']=='UK', 'lvl'].replace(lvls_uk)
df.loc[df['country']=='SP', 'lvl'] = df.loc[df['country']=='SP', 'lvl'].replace(lvls_sp)

同樣一般來說，在使用 DataFrame 時應該避免使用循環，因為在處理大量數據時它們會變得非常慢。

您的代碼未按預期工作的問題是您沒有比較 DataFrame 中的級別是否等於元組中的級別，而是僅比較它是否在級別中。 這是有問題的，因為作為字符串的級別“0”也是級別“20”和“10”的一部分。 這應該修復您的代碼，但如上所述不建議使用：

# Create a list to append them and convert it into a pd.series after
r_levels = []


for index, item in enumerate(df["country"]):
    if item == "SP":
        for k, v in tupla_sp:
            if df["lvl"][index] == v:
                r_levels .append(k)
    if item == "UK":
        for k, v in tupla_uk:
            if df["lvl"][index] == v:
                r_levels .append(k)

試圖用索引替換 pd.DataFrame 列，枚舉中的項目。我得到不同的長度

問題描述

2 個解決方案

解決方案1
1 2022-07-09 09:34:58

解決方案2
1 2022-07-09 09:35:23

試圖用索引替換 pd.DataFrame 列，枚舉中的項目。 我得到不同的長度

問題描述

2 個解決方案

解決方案1 1 2022-07-09 09:34:58

解決方案2 1 2022-07-09 09:35:23

試圖用索引替換 pd.DataFrame 列，枚舉中的項目。我得到不同的長度

解決方案1
1 2022-07-09 09:34:58

解決方案2
1 2022-07-09 09:35:23