简体   繁体   English

如何在 R 中拆分字符串并保存为数据框?

[英]How to split strings and save as data frame in R?

I am trying to split strings based on the number of new lines each string contains.我正在尝试根据每个字符串包含的新行数来拆分字符串。 If the string contains two new lines, I want the first two string from the right side of the strings only.如果字符串包含两个新行,我只想要字符串右侧的前两个字符串。 If it doesn't, then just split the strings and save them in data frame.如果没有,那么只需拆分字符串并将它们保存在数据框中。

I have a sample data below:我在下面有一个示例数据:

data<-data.frame(Info=NA,Variable=NA)

   strings<-c(" Fulton Allem \n Full Name"," 5 ft, 11 in\n 180 cm\n Height","215 lbs\n 97 kg\n Weight")

I want the following results:我想要以下结果:

Info               Variable
Fulton Allem       Full Name
180 cm             Height
97 kg              Weight

Here is my trial:这是我的试验:

splitted<-stri_split_regex(string,"\n")

But this does not work for strings with two new lines.但这不适用于带有两个新行的字符串。 The unit for weight and height are two, but same measurement.体重和身高的单位是两个,但测量值相同。 Hence, I want to get kg for weight and cm for height.因此,我想得到公斤的体重和厘米的身高。

Please note that, the strings can be dynamic.请注意,字符串可以是动态的。 The info for each person varies.每个人的信息各不相同。 Also, some of them do not contain such information.此外,其中一些不包含此类信息。 So i cant use regex to just extract those strings.所以我不能使用正则表达式来提取这些字符串。

You can try the following with str_match from stringr :您可以使用str_match中的stringr尝试以下操作:

stringr::str_match(strings, '(?:.*\n)?\\s(.*)\n\\s(.*)')[, -1]

#        [,1]            [,2]       
#[1,] "Fulton Allem " "Full Name"
#[2,] "180 cm"        "Height"   
#[3,] "97 kg"         "Weight"  

Here we capture the last and second last value between '\n' for each string .在这里,我们为每个string捕获'\n'之间的最后一个和倒数第二个值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM