[英]Merging txt file into a dataframe on R
我有一個包含 100,000 多行數據的 txt 文件。 我想把它變成 dataframe 但不需要每一行數據。 數據條目的示例如下所示:
FN Clarivate Analytics Web of Science
VR 1.0
PT J
AU Yang, Qiang
Liu, Yang
Chen, Tianjian
Tong, Yongxin
TI Federated Machine Learning: Concept and Applications
SO ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY
VL 10
IS 2
AR 12
DI 10.1145/3298981
DT Article
PD FEB 2019
PY 2019
AB Today's artificial intelligence still faces two major challenges (...) etc.
我只想要以 TI、AU、PD、AB 開頭的行,並將它們提取到相應的命名列中。 這也是我所得到的,我真的很掙扎!
read.table("groupprojectdatabase.txt", header = FALSE, sep = ",", quote = "",
dec = ".", numerals = c("allow.loss"),
row.names = c("TI", "AU", "PB","AB"), col.names = c('title_col','author_col','date_col','summary_col'), as.is = !stringsAsFactors,
na.strings = "NA", colClasses = NA, nrows = -1,
skip = 0, check.names = TRUE, fill = FALSE,
strip.white = FALSE, blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE, flush = FALSE,
stringsAsFactors = FALSE,
fileEncoding = "", encoding = "unknown", text, skipNul = FALSE)
任何幫助將不勝感激,即使這是我需要查找的功能或者我是否在正確的軌道上。 我在想 sep = 命令是相關的,但我不知道如何告訴它跳過除 TI、AU、PB 和 AB 行之外的所有內容
特別是我不確定如何對 R 進行編程以將整個句子視為變量,而不是每個單詞等。
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 4 elements
我根據您上面的數據制作了一個文件test.txt
。 在使用read.table
遇到一些問題后,我從 tidyverse 切換到read::read_delim
tidyverse
。
這將逐行讀取文件。 該行然后由第一個whitespace
分隔,即在前 2 個字母之后。
因為有 4 行(AU 前兩個字母)屬於一起,所以下面代碼的最后部分將這些行放在一起。
library(tidyverse)
df <- read_delim("path_to_your/test.txt", delim = ";", col_names = TRUE)
ddf <- df |>
separate(`FN Clarivate Analytics Web of Science`,
into = c("first", "rest"),
sep = " ", extra = 'merge') |>
mutate(first = ifelse(first == "", NA, first)) |>
fill(first) |>
group_by(first) |>
mutate(rest = paste0(rest, collapse = "")) |>
distinct(first, .keep_all = T)
ddf |>
filter(first %in% c('TI', 'AU', 'PD', 'AB'))
#> # A tibble: 4 × 2
#> # Groups: first [4]
#> first rest
#> <chr> <chr>
#> 1 AU Yang, Qiang Liu, Yang Chen, Tianjian Tong, Yongxin
#> 2 TI Federated Machine Learning: Concept and Applications
#> 3 PD FEB 2019
#> 4 AB Today's artificial intelligence still faces two major challenges
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.