根據其他列中的值添加新列

Question

我正在努力將以下代碼行轉換為 r。

for genre in c_a:
    df['is_'+str(genre)] = df['genre'].apply(lambda x: genre in [y.strip() for y in x.split(',')])

基本上，我有一個 object（類型為“字符”，其中有 1341 個值），我想為變量的每個值添加新列，並通過檢查是否將 0/1 值分配給新列新欄目包含在流派欄目中。

例如：

當前輸入：

類型
舞曲流行，流行
鄉村，流行

預計 Output：

類型	流行舞曲	流行音樂	國家
舞曲流行，流行	1個	1個	0
鄉村，流行	0	1個	1個

R中的apply和lambda function不熟悉，只知道通過for循環解決問題，比較慢。

Answer 1

Python：

import pandas as pd

df = pd.DataFrame({"Genre": ["Dance pop, pop", "country, pop"]})
for col in set(sum([i.split(',') for i in df['Genre']],[])):          ##['Dance pop', ' pop', 'country', ' pop']
    df[col] = df['Genre'].apply(lambda x: 1 if col in x.split(',') else 0)
df

Answer 2

您可以使用tidyverse方法，但我懷疑它會加快速度。 假設您的數據存儲在矢量genre中：

library(tidyverse)

genre <- c("dance pop, pop", "country, pop")

genre %>% 
  data.frame(genre = .) %>% 
  expand_grid(genres = unique(trimws(unlist(strsplit(genre, ","))))) %>% 
  mutate(value = +str_detect(genre, genres)) %>% 
  pivot_wider(names_from = genres)

這返回

# A tibble: 2 x 4
  genre          `dance pop`   pop country
  <chr>                <int> <int>   <int>
1 dance pop, pop           1     1       0
2 country, pop             0     1       1

首先，我們創建一個帶有新genres列的 data.frame，其中包含從genre向量中提取的所有獨特流派。
接下來我們尋找genres和genre列之間的匹配項，將其轉換為二進制值。
最后，我們使用pivot_wider將其變成矩形。

如果您的數據存儲在 data.frame 中，則可以使用類似的方法：

data.frame(genre = c("dance pop, pop", "country, pop")) %>% 
  expand_grid(genres = unique(trimws(unlist(strsplit(.$genre, ","))))) %>% 
  mutate(value = +str_detect(genre, genres)) %>% 
  pivot_wider(names_from = genres)

返回相同的 output。

根據其他列中的值添加新列

問題描述

2 個解決方案

解決方案1
0 2022-12-08 09:51:46

解決方案2
0 2022-12-08 16:17:46

根據其他列中的值添加新列

問題描述

2 個解決方案

解決方案1 0 2022-12-08 09:51:46

解決方案2 0 2022-12-08 16:17:46

解決方案1
0 2022-12-08 09:51:46

解決方案2
0 2022-12-08 16:17:46