[英]Add new columns based on values in other columns
I am struggling to convert the following line of code into r.我正在努力将以下代码行转换为 r。
for genre in c_a:
df['is_'+str(genre)] = df['genre'].apply(lambda x: genre in [y.strip() for y in x.split(',')])
basically, I have a object (type "character", with 1341 values in it), and I'd like to add new columns of each value of the variable, and also asign 0/1 value to the new column by checking if the new column is included in the genre column.基本上,我有一个 object(类型为“字符”,其中有 1341 个值),我想为变量的每个值添加新列,并通过检查是否将 0/1 值分配给新列新栏目包含在流派栏目中。
For example:例如:
Current Input:当前输入:
Genre![]() |
---|
dance pop, pop![]() |
country, pop![]() |
Expected Output:预计 Output:
Genre![]() |
dance pop![]() |
pop![]() |
country![]() |
---|---|---|---|
dance pop, pop![]() |
1 ![]() |
1 ![]() |
0 ![]() |
country, pop![]() |
0 ![]() |
1 ![]() |
1 ![]() |
I am not familiar with apply and lambda function in R. I only know how to solve the problem through a for loop, which is slow. R中的apply和lambda function不熟悉,只知道通过for循环解决问题,比较慢。
Python: Python:
import pandas as pd
df = pd.DataFrame({"Genre": ["Dance pop, pop", "country, pop"]})
for col in set(sum([i.split(',') for i in df['Genre']],[])): ##['Dance pop', ' pop', 'country', ' pop']
df[col] = df['Genre'].apply(lambda x: 1 if col in x.split(',') else 0)
df
You could use a tidyverse
approach, but I doubt it would speed things up.您可以使用
tidyverse
方法,但我怀疑它会加快速度。 Suppose your data is stored in a vector genre
:假设您的数据存储在矢量
genre
中:
library(tidyverse)
genre <- c("dance pop, pop", "country, pop")
genre %>%
data.frame(genre = .) %>%
expand_grid(genres = unique(trimws(unlist(strsplit(genre, ","))))) %>%
mutate(value = +str_detect(genre, genres)) %>%
pivot_wider(names_from = genres)
This returns这返回
# A tibble: 2 x 4
genre `dance pop` pop country
<chr> <int> <int> <int>
1 dance pop, pop 1 1 0
2 country, pop 0 1 1
genres
column, that contains all unique genres extracted from the genre
vector.genres
列的 data.frame,其中包含从genre
向量中提取的所有独特流派。genres
and the genre
column, converting it into a binary value.genres
和genre
列之间的匹配项,将其转换为二进制值。pivot_wider
.pivot_wider
将其变成矩形。 If your data is stored in a data.frame a similar approach works:如果您的数据存储在 data.frame 中,则可以使用类似的方法:
data.frame(genre = c("dance pop, pop", "country, pop")) %>%
expand_grid(genres = unique(trimws(unlist(strsplit(.$genre, ","))))) %>%
mutate(value = +str_detect(genre, genres)) %>%
pivot_wider(names_from = genres)
returning the same output.返回相同的 output。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.