[英]How do I import a txt file into R and separate text into columns based on certain criteria
I have some job descriptions saved in a txt file format.我有一些以 txt 文件格式保存的职位描述。 The job title, job description, job title, etc are all lumped together and I am trying to separate them into columns.
职位、职位描述、职位等都集中在一起,我试图将它们分成几列。 The text is about 5 pages long.
正文长约5页。 Here is a sample of how the text is structured -
这是文本结构的示例 -
EXECUTIVE LEVEL
001 Chief Executive Officer: Job description of CEO.
040 Area Director: This line contains job description of the Area Director.
FINANCE TEAM
025 Chief Operating Officer: This line contains job description of the Chief Operating Officer
055 Chief Financial Officer: This person controls operations of the company and reports to the COO
MARKETING TEAM
056 Marketing Director: This person is in charge of the marketing team. Blab la bla
I would like to create a dataframe (or is it called tibble these days?) with 4 columns -我想创建一个有 4 列的 dataframe (或者它现在被称为 tibble 吗?) -
column 1 - The team name (Executive Level, Finance Team, Marketing Team, etc)第 1 列 - 团队名称(执行级别、财务团队、营销团队等)
column 2 - Team number (001, 040 025, 055, etc)第 2 列 - 团队编号(001、040 025、055 等)
column 3 - The job title (Chief Executive Officer, Chief Operating Officer, etc)第 3 栏 - 职位(首席执行官、首席运营官等)
column 4 - The job description第 4 栏 - 职位描述
Thanks in advance提前致谢
x2 <- x[nzchar(x)]
x3 <- split(x2, cumsum(grepl("^[A-Z]", x2)))
x4 <- lapply(x3, function(z) transform(strcapture("^([0-9]+)\\s+([^:]+):\\s*(.*)$", z[-1], list(num="", title="", desc="")), name=z[1]))
x5 <- do.call(rbind, x4)
x5
# num title desc name
# 1.1 001 Chief Executive Officer Job description of CEO. EXECUTIVE LEVEL
# 1.2 040 Area Director This line contains job description of the Area Director. EXECUTIVE LEVEL
# 2.1 025 Chief Operating Officer This line contains job description of the Chief Operating Officer FINANCE TEAM
# 2.2 055 Chief Financial Officer This person controls operations of the company and reports to the COO FINANCE TEAM
# 3 056 Marketing Director This person is in charge of the marketing team. Blab la bla MARKETING TEAM
Data, likely the results of x <- readLines(path_to_file)
.数据,可能是
x <- readLines(path_to_file)
的结果。
x <- c("EXECUTIVE LEVEL", "001 Chief Executive Officer: Job description of CEO.", "040 Area Director: This line contains job description of the Area Director.", "", "FINANCE TEAM", "025 Chief Operating Officer: This line contains job description of the Chief Operating Officer", "055 Chief Financial Officer: This person controls operations of the company and reports to the COO", "", "MARKETING TEAM", "056 Marketing Director: This person is in charge of the marketing team. Blab la bla")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.