简体   繁体   English

读取带有名称和标签的.csv文件到R中

[英]read .csv file with Names and Labels into R

I have a .csv file that I need to read into R. The first row contains the names (eg BFI1, BFI2, CAQ2) and the the second row contains the question which I would also like to access in R (eg "I enjoy going to parties"). 我有一个.csv文件,我需要读入R.第一行包含名称(例如BFI1,BFI2,CAQ2),第二行包含我也想在R中访问的问题(例如“我喜欢参加派对”)。 Each row after the first two corresponds to one participant. 前两个之后的每一行对应一个参与者。

I would like to be able to access the Codes and the text in R. (eg to use grep to access all the questions from one survey, and also to see the item text if needed. I need the numerical responses to be numeric. 我希望能够访问R中的代码和文本(例如,使用grep访问一个调查中的所有问题,并且如果需要还可以查看项目文本。我需要数字响应为数字。

BFI1, BFI2, CAQ1, CAQ2
Likes to read, Enjoys Parties, Is Nervous, Loves Books
3, 7, 1, 4
4, 5, 3, 3

I want to read this in so that I can access either the names (row 1) or the text (as labels maybe). 我想读这个,以便我可以访问名称(第1行)或文本(可能是标签)。 I have looked at the Hmisc package but their label functionality seems limited. 我看过Hmisc包,但它们的标签功能似乎有限。

Is there any way to read in this .csv file and access both of these values? 有没有办法读取此.csv文件并访问这两个值?

Not sure if you're okay with having the labels as a separate vector, but here's an idea. 不确定你是否可以将标签作为单独的矢量,但这是一个想法。 Suppose your file name is x.txt 假设您的文件名是x.txt

## set up an argument list for scan() - just to avoid repetition
scanArgs <- list(
    file = "x.txt", what = "", nlines = 1, sep = ",", strip.white = TRUE
)

## read the data with no header and add the first line as names
df <- setNames(
    read.table("x.txt", skip = 2, sep = ","), 
    do.call(scan, scanArgs)
)
#   BFI1 BFI2 CAQ1 CAQ2
# 1    3    7    1    4
# 2    4    5    3    3

## make the label vector
labels <- setNames(do.call(scan, c(scanArgs, skip = 1)), names(df))
#            BFI1             BFI2             CAQ1             CAQ2 
# "Likes to read" "Enjoys Parties"     "Is Nervous"    "Loves Books" 

So the elements in labels correspond to the columns in df and the columns are numeric. 因此labels的元素对应于df的列,而列是数字。

Note that x.txt was created with 请注意, x.txt是使用创建的

txt <- 'BFI1, BFI2, CAQ1, CAQ2
Likes to read, Enjoys Parties, Is Nervous, Loves Books
3,7,1,4
4,5,3,3'
writeLines(txt, "x.txt")

You can use the nrows and skip arguments or read.csv 您可以使用nrows和skip参数或read.csv

nameFile <- "data.csv"

# read the first two lines
vectorNames <- read.csv(nameFile, nrows = 1)
vectorDescription <- read.csv(nameFile, nrows = 1, skip = 1)

# read the data
dfIn <- read.csv(nameFile, skip = 2)
names(dfIn) <- vectorNames

@Richard Scriven I used your code and followed it up with this using the package @Richard Scriven我使用了你的代码并使用它来跟进它

library(Hmisc)
y=data.frame(temp=rep(NA,nrow(df)))  
for (i in 1:length(labels)){  
x=df[,i]  
label(x)=labels[i]   
y[names(df)[i]]=x  
}  
y$temp=NULL  
y  
#  BFI1 BFI2 CAQ1 CAQ2
# 1    3    7    1    4
# 2    4    5    3    3
label(y)
#            BFI1             BFI2             CAQ1             CAQ2 
# "Likes to read" "Enjoys Parties"     "Is Nervous"    "Loves Books" 

Building on Michelle Usuelli's answer and with Rich Scriven correction, you can write this function: 基于Michelle Usuelli的答案和Rich Scriven校正,您可以编写此函数:

read_csv_with_labels <- function(fileName)
{
 library(Hmisc)

 # read the first two lines
 varNames <- read.csv(fileName, nrows = 1, stringsAsFactors = FALSE, header = FALSE)
 varLabels <- read.csv(fileName, nrows = 1, stringsAsFactors = FALSE, header = TRUE)

 # read the data
 df <- read.csv(fileName, skip = 2)

 # assign variable names and labels to the dataframe
 names(df) <- varNames
 label(df) <- varLabels 

 return(df)
}

I think this should be included in the basic functionality of read.csv and read_csv. 我认为这应该包含在read.csv和read_csv的基本功能中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM