R：如何讀取包含非數據集信息的csv文件

Question

我有一個.csv文件，在記事本中沒有顯示換行符。 Notepad ++最后顯示了LF字符，但我無法弄清楚如何告訴R將該字符用作換行符或如何用CRLF或\\ n替換它。

**編輯：這是一個示例文件。

Answer 1

使用我們快速，友好的文件finagler：

library(data.table)

url <- 'https://dl.dropboxusercontent.com/u/8428744/Collaboration_vs_Publication_Year.csv'

# ignore first 14 rows per OP comment
df <-fread(url, skip = 14) # in this case, it works even without skip=

# put first 14 rows somewhere else
other_stuff <- readLines(url, n=14)

警告消息：在fread（“ https://dl.dropboxusercontent.com/u/8428744/Collaboration_vs_Publication_Year.csv ”）：在空行23處停止讀取但后來文本存在（丟棄）：“©2015 Elsevier BV保留所有權利.SciVal®是Reed Elsevier Properties SA的注冊商標，經許可使用。“

df
#                            V1 V2   V3   V4   V5   V6   V7   V8   V9  V10
# 1:           Brown University NA 0.80 0.84 0.81 0.79 0.79 0.79 0.76 0.64
# 2:        Columbia University NA 0.98 0.96 0.95 0.96 1.00 1.01 0.97 1.26
# 3:         Cornell University NA 0.94 0.92 0.93 0.95 0.93 0.98 0.94 1.26
# 4:          Dartmouth College NA 0.74 0.79 0.70 0.75 0.74 0.75 0.73 0.60
# 5:         Harvard University NA 1.08 1.05 1.06 1.10 1.09 1.10 1.08 0.97
# 6:       Princeton University NA 1.04 0.99 1.02 1.06 1.08 1.05 1.06 0.87
# 7: University of Pennsylvania NA 0.80 0.78 0.79 0.83 0.81 0.80 0.79 0.83
# 8:            Yale University NA 0.93 0.90 0.92 0.95 0.91 0.97 0.90 1.07

cat(other_stuff[nchar(other_stuff)>0], sep = '\n')
# ï»¿Data set,Collaboration vs Publication Year
# Entities,"Brown University, Columbia University, Cornell University, Dartmouth College, Harvard University, Princeton University, University of Pennsylvania, Yale University"
# Year range,2010 to >2015
# Filtered by,"not filtered"
# Data source,Scopus
# Date last updated,16 October 2015
# Date exported,19 November 2015
# Metric name,Specific metric,Self-citations,Types of publications included,Other options
# Collaboration,International collaboration,-,"Articles, reviews and conference papers","field-weighted"
# Name,Tags,Collaboration,
# ,,Overall,2010,2011,2012,2013,2014,2015,>2015,

Answer 2

正如您所提到的，您希望保留所有數據，您可以嘗試以下方法。 源文件很亂，所以這不是一個完全自動化的解決方案，未來的文件需要額外的按摩。

myfile <- readLines("https://dl.dropboxusercontent.com/u/8428744/Collaboration_vs_Publication_Year.csv")

df1 <- read.csv(text=myfile, skip = grep("Overall", myfile) - 1)
df2 <- read.csv(text=myfile, nrows = grep("Overall", myfile) - 1, header = FALSE)
finaldf <- data.frame(df1[, colSums(is.na(df1)) != nrow(df1)], t(unstack(df2, V2 ~ V1)))[-nrow(df1), ]

R：如何讀取包含非數據集信息的csv文件

問題描述

2 個解決方案

解決方案1
3 已采納 2015-11-19 16:08:21

解決方案2
0 2015-11-19 16:53:33

R：如何讀取包含非數據集信息的csv文件

問題描述

2 個解決方案

解決方案1 3 已采納 2015-11-19 16:08:21

解決方案2 0 2015-11-19 16:53:33

解決方案1
3 已采納 2015-11-19 16:08:21

解決方案2
0 2015-11-19 16:53:33