简体   繁体   English

将单列拆分为多列

[英]Splitting a single column into multiple columns

I need to split a dataframe into multiple columns to make sure only one value is contained within each cell. 我需要将一个数据框拆分为多个列,以确保每个单元格中仅包含一个值。 The current dataframe looks like: 当前数据框如下所示:

 projectid|  count | Name |  Type                         |   count   |
 .....................................................................
 ABC      |  211   | jack |abc(Apple, Orange, Water melon)|   Multiple|
 DBG      | 90     | jill | Plum                          |   single  |

The new dataframe should look like 新的数据框应该看起来像

 projectid|  count | Name |  Type          |  count |
 ....................................................
 ABC      |  211   | jack |  Apple         |Multiple|
 ABC      |  211   | jack |  Orange        |Multiple|
 ABC      |  211   | jack |  Water melon   |Multiple|
 DBG      |  90    | jill |  Plum          |single  |

I can split the single cell using regular expression based on the "()" and "," as a separator. 我可以使用基于“()”和“,”作为分隔符的正则表达式拆分单个单元格。 However, I can't figure out how to populate multiple columns. 但是,我不知道如何填充多列。

One way would be to extract everything between brackets and then use separate_rows 一种方法是提取方括号之间的所有内容,然后使用separate_rows

library(dplyr)
library(tidyr)

df %>%
  mutate(Type = sub(".*\\((.*)\\).*", "\\1", Type)) %>%
  separate_rows(Type, sep = ",")

#  projectid count Name         Type  count.1
#1       ABC   211 jack        Apple Multiple
#2       ABC   211 jack       Orange Multiple
#3       ABC   211 jack  Water melon Multiple
#4       DBG    90 jill         Plum   single

The main part is the regex to extract everything in between round brackets. 主要部分是正则表达式,用于提取圆括号之间的所有内容。 Once we do that we can use any of the method from this link to separate comma-separated value in different rows. 完成此操作后,我们可以使用链接中的任何方法将逗号分隔的值分隔在不同的行中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM