R文字清理

Question

I have a csv file with many entries as follows (one example provided): 我有一个csv文件，其中包含许多条目，如下所示（提供了一个示例）：

Customer 1 car purchase
08/22/2016 08:10:00 Agent 1 (Agt1)
Customer 1 car purchase and service purchase.\n
Service indicates tires needed\n
possible oil change as well.\n
Tire quote provided.\n
*Name: Service advisor \n
*Phone: 123-456-7890 \n
Customer 1 called back to schedule appt.\n

I am trying to write R code such that output is as follows (for each entry) 我正在尝试编写R代码，以使输出如下（对于每个条目）

Customer 1 car purchase and service purchase.
Service indicates tires needed and possible oil change as well.
Tire quote provided.
Customer 1 called back to schedule appt.

I am looking to strip out the first two lines and any lines with *Name and *Phone out. 我想删除前两行以及带有* Name和* Phone的任何行。

One thing I tried is to use is assigning each entry to a temp variable and then 我尝试使用的一件事是将每个条目分配给temp变量，然后

stri_split_lines (temp)
x=stri_split_lines(temp)
y=x[[1]][3:length(x[[1]])]

This extracts out the first two lines. 这将提取出前两行。 However I am not sure how to extract the lines with *Name and *Phone as they could be anywhere in the text. 但是我不确定如何使用* Name和* Phone提取行，因为它们可能在文本中的任何位置。 I am also quite convinced there is probably a better way out there :) Any ideas on how I can achieve this? 我也非常相信可能还有更好的方法：)关于如何实现这一目标的任何想法？ The lines have \\n at the end, so I was hoping to use regex to split based on that, but was not able to get it to work. 这些行的末尾有\\ n，因此我希望使用regex进行拆分，但是无法使其正常工作。 Thanks! 谢谢！

Answer 1

You can use readLines or strsplit to read in each entry (with lapply as necessary), and then grep to index: 您可以使用readLines或strsplit读取每个条目（必要时使用lapply ），然后使用grep来索引：

x <- readLines(textConnection('Customer 1 car purchase
                               08/22/2016 08:10:00 Agent 1 (Agt1)
                               Customer 1 car purchase and service purchase.
                               Service indicates tires needed
                               possible oil change as well.
                               Tire quote provided.
                               *Name: Service advisor 
                               *Phone: 123-456-7890 
                               Customer 1 called back to schedule appt.'))

x <- trimws(x)    # clean up extra white space

x[c(-1, -2, -grep('\\*Name|\\*Phone', x))]
## [1] "Customer 1 car purchase and service purchase."
## [2] "Service indicates tires needed"               
## [3] "possible oil change as well."                 
## [4] "Tire quote provided."                         
## [5] "Customer 1 called back to schedule appt."

paste back to a single block if you like. 如果愿意，请paste回单个块。

R文字清理

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-12-13 01:20:39

R文字清理

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-12-13 01:20:39

解决方案1
0 已采纳 2016-12-13 01:20:39