簡體   English   中英

Python Pandas:如何根據基於另一列的條件創建列?

[英]Python Pandas: How do I create a column given a condition based on another column?

給定以下數據框:

df_test = pd.DataFrame(
    [[1, "BURGLARY"], [2, "PETIT LARCENY"], [3, "DANGEROUS DRUGS"], [4, "LOITERING FOR DRUG PURPOSES"], [5, "DANGEROUS WEAPONS"]],
      columns = ['id','ofns_desc']
)

在此處輸入圖片說明

我想添加一個新列來簡化ofns_desc列中的描述。 我做了以下事情:

THEFT = ["BURGLARY", "PETIT LARCENY"]
df_test.loc[df_test.ofns_desc.isin(THEFT), 'category'] = "THEFT"

DRUGS = ["DANGEROUS DRUGS", "LOITERING FOR DRUG PURPOSES"]
df_test.loc[df_test.ofns_desc.isin(DRUGS), 'category'] = "DRUGS"

到目前為止,上面的代碼有效:

在此處輸入圖片說明

但是,當我嘗試創建一個"OTHER"的價值category列,在每個值category列被覆蓋:

ALL_CAT = [THEFT, DRUGS]
df_test.loc[~df_test.ofns_desc.isin(ALL_CAT), 'category'] = "OTHER"

在此處輸入圖片說明

我究竟做錯了什么?

問題是您測試嵌套列表,因此所有值都失敗了,您需要通過+連接列表,而不是像更改一樣傳遞給[]

ALL_CAT = [THEFT, DRUGS]

到:

ALL_CAT = THEFT + DRUGS

另一個想法是創建詞典和Series.map ,最后通過替換缺失值Series.fillna

THEFT = ["BURGLARY", "PETIT LARCENY"]
DRUGS = ["DANGEROUS DRUGS", "LOITERING FOR DRUG PURPOSES"]
d = {"THEFT":THEFT, 'DRUGS':DRUGS}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'BURGLARY': 'THEFT', 'PETIT LARCENY': 'THEFT',
 'DANGEROUS DRUGS': 'DRUGS', 'LOITERING FOR DRUG PURPOSES': 'DRUGS'}

df_test['category'] = df_test['ofns_desc'].map(d1).fillna("OTHER")
print (df_test)
   id                    ofns_desc category
0   1                     BURGLARY    THEFT
1   2                PETIT LARCENY    THEFT
2   3              DANGEROUS DRUGS    DRUGS
3   4  LOITERING FOR DRUG PURPOSES    DRUGS
4   5            DANGEROUS WEAPONS    OTHER

最好為此使用numpy.select 它的性能很高:

In [2555]: import numpy as np

In [2556]: THEFT = ["BURGLARY", "PETIT LARCENY"]

In [2557]: DRUGS = ["DANGEROUS DRUGS", "LOITERING FOR DRUG PURPOSES"]

In [2558]: conditions = [df_test.ofns_desc.isin(THEFT), df_test.ofns_desc.isin(DRUGS)]

In [2559]: choices = ['THEFT', 'DRUGS']

In [2564]: df_test['category'] = np.select(conditions, choices, default='OTHER')

In [2565]: df_test
Out[2565]: 
   id                    ofns_desc category
0   1                     BURGLARY    THEFT
1   2                PETIT LARCENY    THEFT
2   3              DANGEROUS DRUGS    DRUGS
3   4  LOITERING FOR DRUG PURPOSES    DRUGS
4   5            DANGEROUS WEAPONS    OTHER

而是填充 NaN 值?

df_test['category'] = df_test['category'].fillna("OTHER")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM