[英]Why have my data values been replaced by "NaN" after using qcut?
我正在使用 9000 行和 6 列的熊貓數據框。 在這一點上,我正在嘗試將工作的連續變量“經驗”年數轉換為 4 個工作(商業經理 - 業務開發人員 -網絡營銷人員 - 流量管理器)。
鑒於每個工作的年經驗范圍不一樣,我使用“qcut”將數據分為4組,如下所示:
(您可以運行下面的代碼來獲取數據幀示例)
import pandas as pd
df = pd.DataFrame({'Job': ['Commercial Manager', 'Traffic Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Web Marketer', 'Commercial Manager', 'Commercial Manager', 'Traffic Manager', 'Business Developer', 'Business Developer', 'Web Marketer', 'Traffic Manager', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Traffic Manager', 'Commercial Manager', 'Business Developer', 'Business Developer', 'Web Marketer'],
'Experience': [1.00000, 3.00000, 3.00000, 1.50000, 2.00000, 6.00000, 0.00000, 4.00000, 8.00000, 5.00000, 0.50000, 3.00000, 3.00000, 0.00000, 2.00000, 3.00000, 0.50000, 3.00000, 3.00000, 8.00000, 3.50000]})
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
def convert(levels, jobs):
for j in jobs:
df["Level"] = pd.qcut(df.loc[df["Job"] == j, "Experience"].rank(method="first"), q = 4, labels = levels, duplicates = "drop")
return df
convert(levels, jobs)
這是使用“qcut”后的輸出:
Job Experience Level
0 Commercial Manager 1.00000 NaN
1 Traffic Manager 3.00000 intermediate
2 Web Marketer 3.00000 NaN
3 Commercial Manager 1.50000 NaN
4 Commercial Manager 2.00000 NaN
5 Web Marketer 6.00000 NaN
6 Commercial Manager 0.00000 NaN
7 Commercial Manager 4.00000 NaN
8 Traffic Manager 8.00000 expert
9 Business Developer 5.00000 NaN
10 Business Developer 0.50000 NaN
11 Web Marketer 3.00000 NaN
12 Traffic Manager 3.00000 intermediate
13 Traffic Manager 0.00000 beginner
14 Commercial Manager 2.00000 NaN
15 Business Developer 3.00000 NaN
16 Traffic Manager 0.50000 beginner
17 Commercial Manager 3.00000 NaN
18 Business Developer 3.00000 NaN
19 Business Developer 8.00000 NaN
20 Web Marketer 3.50000 NaN
看來它只適用於“交通管理器”,它用 NaN 取代了其他level
經驗。 我真的很失落。 請問有什么幫助嗎?
您想在 groupby 操作中執行此操作:
import numpy
import pandas
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(levels, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g, q=4, labels=levels, duplicates="drop"))
)
# I would just change two things to what Paul suggested (jobs instead of levels and the rank(method="first") because there was still an error:
levels = ["beginner", "intermediate", "advanced", "expert"]
jobs = ["Commercial Manager", "Business Developer", "Web Marketer", "Traffic Manager"]
df = pandas.DataFrame({
'Job': numpy.random.choice(jobs, size=150),
'Experience': numpy.random.uniform(0.25, 10.5, size=150)
}).assign(
level=df.groupby(['Job'])['Experience'] # for each unique job...
# apply a quantile (quartile) cut
.apply(lambda g: pd.qcut(g.rank(method="first"), q=4, labels=levels, duplicates="drop"))
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.