Python：如何对具有多个值的功能进行热编码？

Question

我有以下 dataframe df ，在飞机的route列中带有旅行城市的名称，其为ticket_price 。

我想从route中获取各个城市名称，并对它们进行热编码。

Dataframe ( df )

id|         route               | ticket_price
1 | Mumbai - Pune - Bangalore   |   10000
2 | Pune - Delhi                |    7000
3 | Delhi - Pune                |    6500

必需 Dataframe ( df_encoded )

id | route_mumbai | route_pune | route_bangalore | route_delhi | ticket_price
1  |      1       |      1     |      1          |     0       |   10000
2  |      0       |      1     |      0          |     1       |    7000
3  |      0       |      1     |      0          |     1       |    6500

代码
我已经使用以下代码对route列进行了一些预处理，但无法理解如何对其进行热编码。

def location_preprocessing(text):

  """
  Function to Preprocess the features having location names.
  """

  text = text.replace(" ", "")    # Remove whitespaces
  text = text.split("|")          # Obtain individual cities

  lst_text = [x.lower() for x in text]    # Lowercase city names

  text = " ".join(lst_text)               # Convert to string from list

  return text

df['route'] = df['route'].apply(lambda x: location_preprocessing(x))

如果我使用下面的代码直接应用一种热编码，那么所有路线都被认为是唯一的，并且是一种单独的热编码，这是不需要的。 我希望个别城市成为一个热门编码而不是路线。

df = pd.get_dummies(df, columns = ['route'])    # One-hot Encoding `route`

如何获得所需的dataframe？

Answer 1

如果您有 dataframe：

   id                      route  ticket_price
0   1  Mumbai - Pune - Bangalore         10000
1   2               Pune - Delhi          7000
2   3               Delhi - Pune          6500

然后：

df.route = df.route.str.split(" - ")
df_out = pd.concat(
    [
        df.explode("route")
        .pivot_table(index="id", columns="route", aggfunc="size", fill_value=0)
        .add_prefix("Route_"),
        df.set_index("id").ticket_price,
    ],
    axis=1,
)
print(df_out)

印刷：

    Route_Bangalore  Route_Delhi  Route_Mumbai  Route_Pune  ticket_price
id                                                                      
1                 1            0             1           1         10000
2                 0            1             0           1          7000
3                 0            1             0           1          6500

Python：如何对具有多个值的功能进行热编码？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-02 10:14:36

Python：如何对具有多个值的功能进行热编码？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-02 10:14:36

解决方案1
2 已采纳 2021-04-02 10:14:36