通过解析R中的字符向量构建数据帧

Question

我是R的新手，正在努力从博物馆的藏品中构建数据集。

抓取他们的网站后，我得到了一个字符向量列表（假设名称为“特征”），其中每个元素如下所示：

[[4729]]
[1] " Date://2002 Medium://Pencil on paper Dimensions://22 1/2 x 30 1/8\" (57.2 x 76.5 cm) Credit Line://The Judith Rothschild Foundation Contemporary Drawings Collection Gift MoMA Number://1563.2005 Copyright://© 2015 Steve DiBenedetto"

从这些向量，我想制作一个数据帧，如下所示：

     year    medium           dimensions    credit line    number
1   2002     Pencil on paper   etc...

但是，我似乎无法设法从字符向量中减去必要的数据，因为我正努力使用正则表达式。 想法是获取在“ Date：//”之后和“ Medium：//”之前的内容。 为了使事情变得更复杂，列表中的每个元素并非都具有相同的特征（例如，某些元素仅具有“日期”和“中号”，而其他元素包括“ edition：//”，“通过：//获取”）等）。

只需保存每个列表元素中的前4位数字，就可以很容易地编制年份列表：

year <- list()

for(p in 1:length(characteristics)) {
  string <- as.character(characteristics[p])
  year <- c(year, str_extract(string, "\\d\\d\\d\\d"))
  }

这可能甚至不是最快的方法，但效果很好。 但是，我完全坚持从列表中提取其他变量。

Answer 1

也许不错的旧read.table也可以选择：

txt <- c("Date://2002 Medium://Pencil on paper Dimensions://22 1/2 x 30 1/8\" (57.2 x 76.5 cm) Credit Line://The Judith Rothschild Foundation Contemporary Drawings Collection Gift MoMA Number://1563.2005 Copyright://© 2015 Steve DiBenedetto",
         "Date://2002 Medium://Pencil on paper Dimensions://22 1/2 x 30 1/8\" (57.2 x 76.5 cm) Credit Line://The Judith Rothschild Foundation Contemporary Drawings Collection Gift MoMA Number://1563.2005 Copyright://© 2015 Steve DiBenedetto")
read.table(text = gsub("( Credit)?\\s?[A-z]+://", "\t", txt), sep = "\t", quote = "", col.names = letters[1:7])[-1]
#      b               c                                 d                                                                           e      f                        g
# 1 2002 Pencil on paper 22 1/2 x 30 1/8" (57.2 x 76.5 cm) The Judith Rothschild Foundation Contemporary Drawings Collection Gift MoMA 1563.2 © 2015 Steve DiBenedetto
# 2 2002 Pencil on paper 22 1/2 x 30 1/8" (57.2 x 76.5 cm) The Judith Rothschild Foundation Contemporary Drawings Collection Gift MoMA 1563.2 © 2015 Steve DiBenedetto

通过解析R中的字符向量构建数据帧

问题描述

1 个解决方案

解决方案1
0 2015-04-24 14:28:51

通过解析R中的字符向量构建数据帧

问题描述

1 个解决方案

解决方案1 0 2015-04-24 14:28:51

解决方案1
0 2015-04-24 14:28:51