[英]How to select strings to read from file or data.frame by partial string match or regex in R?
這是文件示例:
PG32 -13475.111367 9609.545216 -20675.190735 -194.319140
PG04 -15764.275182 19616.036013 -8378.361758 -9.567460
PG08 -23862.812721 9840.809904 -4415.011886 18.783955
PG10 25009.053940 9106.541565 2672.535304 -168.226094
PG14 -14188.519147 -9647.162991 -20079.808927 76.323202
PG13 12541.368512 -14252.727697 18475.956052 -99.144840
PG28 22638.858335 13831.226799 2650.716670 427.905209
PG21 -10609.714398 -12191.750707 21782.583544 -429.224611
PG11 -8677.979931 23944.136240 -7811.280190 -566.272355
PG22 -24991.333186 -9073.717145 -1692.043749 331.646741
PG20 25603.243214 5007.836647 5172.462172 302.625348
PG18 -19417.534666 -15923.466357 9597.721199 388.425996
實際上是更大的時間。 第一列是衛星的“名稱”(例如“ PG32”)。 我有一個帶有sats id的字符向量:
>[1] "PG05" "PG07" "PG09" "PG10" "PG13" "PG16" "PG19" "PG20" "PG27" "PG28" "PG30"
因此,我只需要使用gsubfn包read.pattern從data.frame或文件中提取具有這些ID的行。 我正在嘗試使用正則表達式,但尚未完全理解該主題。
考慮使用scan逐行掃描文件,反復檢查第一列是否在附屬列表中:
## INITIAL VARS
file <- "C:\\Path\\To\\File.txt"
flines <- 12
satnames <- c("PG05", "PG07", "PG09", "PG10", "PG13", "PG16",
"PG19", "PG20", "PG27", "PG28", "PG30", "PG32")
## OPEN CONNECTION
con <- file(description=file, open="r")
## LOOP OVER CONNECTION
dfList <- c()
for(i in 1:flines) {
tmp <- scan(file=con, nlines=1, what = list("","","","",""), quiet=TRUE)
names(tmp) <- c('sat', 'data1', 'data2', 'data3', 'data4')
# APPEND TO DFLIST ONLY IF IN SATNAMES LIST
if (tmp$sat %in% satnames) {
dfList <- c(dfList, list(tmp))
}
}
# CLOSE CONNECTION
unlink(tmp)
close(con)
# MIGRATE LIST TO DATA FRAME, CONVERTING DATA TYPES
df <- as.data.frame(do.call(rbind, dfList))
df[,c(2:5)] <- sapply(df[,(2:5)], function(x) as.numeric(as.character(x)))
rm(con, dfList, tmp)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.