簡體   English   中英

為什么在使用 qcut 后我的數據值被“NaN”替換了?

[英]Why have my data values been replaced by "NaN" after using qcut?

我正在使用 9000 行和 6 列的熊貓數據框。 在這一點上,我正在嘗試將工作的連續變量“經驗”年數轉換為 4 個工作(商業經理 - 業務開發人員 -網絡營銷人員 - 流量管理器)。

鑒於每個工作的年經驗范圍不一樣,我使用“qcut”將數據分為4組,如下所示:

(您可以運行下面的代碼來獲取數據幀示例)

import pandas as pd


df = pd.DataFrame({'Job': ['Commercial Manager', 'Traffic Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Traffic Manager', 'Business Developer', 'Business Developer', 'Web Marketer', 'Traffic Manager', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Business Developer', 'Web Marketer'], 
                   'Experience': [1.00000, 3.00000, 3.00000, 1.50000, 2.00000, 6.00000, 0.00000, 4.00000, 8.00000, 5.00000, 0.50000, 3.00000, 3.00000, 0.00000, 2.00000, 3.00000, 0.50000, 3.00000, 3.00000, 8.00000, 3.50000]})


levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]


def convert(levels, jobs):
  for j in jobs:
    df["Level"] = pd.qcut(df.loc[df["Job"] == j, "Experience"].rank(method="first"), q = 4, labels = levels, duplicates = "drop")
  return df

convert(levels, jobs)

這是使用“qcut”后的輸出:

    Job                     Experience       Level 
0   Commercial Manager      1.00000          NaN
1   Traffic Manager         3.00000          intermediate
2   Web Marketer            3.00000          NaN
3   Commercial Manager      1.50000          NaN
4   Commercial Manager      2.00000          NaN
5   Web Marketer            6.00000          NaN
6   Commercial Manager      0.00000          NaN
7   Commercial Manager      4.00000          NaN
8   Traffic Manager         8.00000          expert
9   Business Developer      5.00000          NaN 
10  Business Developer      0.50000          NaN 
11  Web Marketer            3.00000          NaN 
12  Traffic Manager         3.00000          intermediate
13  Traffic Manager         0.00000          beginner
14  Commercial Manager      2.00000          NaN
15  Business Developer      3.00000          NaN
16  Traffic Manager         0.50000          beginner
17  Commercial Manager      3.00000          NaN
18  Business Developer      3.00000          NaN
19  Business Developer      8.00000          NaN
20  Web Marketer            3.50000          NaN

看來它只適用於“交通管理器”,它用 NaN 取代了其他level經驗。 我真的很失落。 請問有什么幫助嗎?

您想在 groupby 操作中執行此操作:

import numpy
import pandas

levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]

df = pandas.DataFrame({
    'Job': numpy.random.choice(levels, size=150), 
    'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
    level=df.groupby(['Job'])['Experience'] # for each unique job...
            # apply a quantile (quartile) cut 
            .apply(lambda g: pd.qcut(g, q=4, labels=levels, duplicates="drop"))
)
  # I would just change two things to what Paul suggested (jobs instead of levels and the rank(method="first") because there was still an error:

levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]

df = pandas.DataFrame({
  'Job': numpy.random.choice(jobs, size=150), 
  'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
  level=df.groupby(['Job'])['Experience'] # for each unique job...
        # apply a quantile (quartile) cut 
        .apply(lambda g: pd.qcut(g.rank(method="first"), q=4, labels=levels, duplicates="drop"))
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM