根据其他列中的值添加新列

Question

I am struggling to convert the following line of code into r.我正在努力将以下代码行转换为 r。

for genre in c_a:
    df['is_'+str(genre)] = df['genre'].apply(lambda x: genre in [y.strip() for y in x.split(',')])

basically, I have a object (type "character", with 1341 values in it), and I'd like to add new columns of each value of the variable, and also asign 0/1 value to the new column by checking if the new column is included in the genre column.基本上，我有一个 object（类型为“字符”，其中有 1341 个值），我想为变量的每个值添加新列，并通过检查是否将 0/1 值分配给新列新栏目包含在流派栏目中。

For example:例如：

Current Input:当前输入：

Genre类型
dance pop, pop舞曲流行，流行
country, pop乡村，流行

Expected Output:预计 Output：

Genre类型	dance pop流行舞曲	pop流行音乐	country国家
dance pop, pop舞曲流行，流行	1 1个	1 1个	0 0
country, pop乡村，流行	0 0	1 1个	1 1个

I am not familiar with apply and lambda function in R. I only know how to solve the problem through a for loop, which is slow. R中的apply和lambda function不熟悉，只知道通过for循环解决问题，比较慢。

Answer 1

Python: Python：

import pandas as pd

df = pd.DataFrame({"Genre": ["Dance pop, pop", "country, pop"]})
for col in set(sum([i.split(',') for i in df['Genre']],[])):          ##['Dance pop', ' pop', 'country', ' pop']
    df[col] = df['Genre'].apply(lambda x: 1 if col in x.split(',') else 0)
df

Answer 2

You could use a tidyverse approach, but I doubt it would speed things up.您可以使用tidyverse方法，但我怀疑它会加快速度。 Suppose your data is stored in a vector genre :假设您的数据存储在矢量genre中：

library(tidyverse)

genre <- c("dance pop, pop", "country, pop")

genre %>% 
  data.frame(genre = .) %>% 
  expand_grid(genres = unique(trimws(unlist(strsplit(genre, ","))))) %>% 
  mutate(value = +str_detect(genre, genres)) %>% 
  pivot_wider(names_from = genres)

This returns这返回

# A tibble: 2 x 4
  genre          `dance pop`   pop country
  <chr>                <int> <int>   <int>
1 dance pop, pop           1     1       0
2 country, pop             0     1       1

First we create a data.frame with a new genres column, that contains all unique genres extracted from the genre vector.首先，我们创建一个带有新genres列的 data.frame，其中包含从genre向量中提取的所有独特流派。
Next we look for a match between the genres and the genre column, converting it into a binary value.接下来我们寻找genres和genre列之间的匹配项，将其转换为二进制值。
Finally we bring it into a rectangular shape using pivot_wider .最后，我们使用pivot_wider将其变成矩形。

If your data is stored in a data.frame a similar approach works:如果您的数据存储在 data.frame 中，则可以使用类似的方法：

data.frame(genre = c("dance pop, pop", "country, pop")) %>% 
  expand_grid(genres = unique(trimws(unlist(strsplit(.$genre, ","))))) %>% 
  mutate(value = +str_detect(genre, genres)) %>% 
  pivot_wider(names_from = genres)

returning the same output.返回相同的 output。

根据其他列中的值添加新列

问题描述

2 个解决方案

解决方案1
0 2022-12-08 09:51:46

解决方案2
0 2022-12-08 16:17:46

根据其他列中的值添加新列

问题描述

2 个解决方案

解决方案1 0 2022-12-08 09:51:46

解决方案2 0 2022-12-08 16:17:46

解决方案1
0 2022-12-08 09:51:46

解决方案2
0 2022-12-08 16:17:46