简体   繁体   English

R- 导入 CSV 文件,所有数据落入一个(第一)列

[英]R- import CSV file, all data fall into one (the first) column

I'm new, and I have a problem:我是新手,我有一个问题:

I got a dataset (csv file) with the 15 columns and 33,000 rows.我得到了一个包含 15 列和 33,000 行的数据集(csv 文件)。

When I view the data in Excel it looks good, but when I try to load the data into R- studio I have a problem:当我查看 Excel 中的数据时,它看起来不错,但是当我尝试将数据加载到 R-studio 时,我遇到了问题:

I used the code:我使用了代码:

x <- read.csv(file = "1energy.csv", head = TRUE, sep="")
View(x)

The result is that the columnnames are good, but the data (row 2 and further) are all in my first column.结果是列名很好,但数据(第 2 行及以后)都在我的第一列中。

In the first column the data is separated with;在第一列中,数据被分隔; . . But when i try the code:但是当我尝试代码时:

x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";")

The next problem is: Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed下一个问题是: error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

So i made the code:所以我做了代码:

x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";", row.names = NULL)

And it looks liked it worked.... But now the data is in the wrong columns (for example, the "name" column contains now the "time" value, and the "time" column contains the "costs" value.它看起来很有效……但现在数据在错误的列中(例如,“名称”列现在包含“时间”值,而“时间”列包含“成本”值。

Does anybody know how to fix this?有人知道如何解决这个问题吗? I can rename columns but i think that is not the best way.我可以重命名列,但我认为这不是最好的方法。

Excel, in its English version at least, may use comma as separator, so you may want to try Excel,至少在其英文版本中,可能使用逗号作为分隔符,因此您可能想尝试

x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=",")

I once had a similar problem where header had a long entry that contained a character that read.csv mistook for column separator.我曾经遇到过类似的问题,其中 header 有一个很长的条目,其中包含一个 read.csv 误认为列分隔符的字符。 In reality it was a part of a long name that wasn't quoted properly.实际上,它是未正确引用的长名称的一部分。 Try skipping header and see if the problem persists尝试跳过标题并查看问题是否仍然存在

x1 <- read.csv(file = "1energy.csv", skip = 1, head = FALSE, sep=";")

In reply to your comment: Two things you can do.回复您的评论:您可以做两件事。 Simplest one is to assign names manually:最简单的一种是手动分配名称:

myColNames <- c(“col1.name”,”col2.name”)
names(x1) <- myColNames

the other way is to read just the name row (the first line in you file) read only the first line, split it into character vector另一种方法是只读取名称行(文件中的第一行)只读取第一行,将其拆分为字符向量

nameLine <- readLines(="1energy.csv", n=1)
fileColNames <- unlist(strsplit(nameLine,”;”))

then see how you can fix the problem, then assign name s to your x1 data frame.然后看看如何解决问题,然后将 name s 分配给 x1 数据框。 I don't know what exactly is wrong with your first line, so I can't tell you how to fix it.我不知道你的第一行到底有什么问题,所以我不能告诉你如何解决它。

Yet another cruder option is to open your csv file using a text editor and edit column names.另一个粗略的选择是使用文本编辑器打开 csv 文件并编辑列名称。

It happens because of Exel's specifics.这是因为 Exel 的具体情况。 The easy solution is just to copy all your data Ctrl+C to Notepad and Save it again from Notepad as filename.csv (don't forget to remove .txt if necessary).简单的解决方案是将所有数据 Ctrl+C 复制到记事本,然后从记事本中再次将其保存为 filename.csv(如有必要,请不要忘记删除 .txt)。 It worked well for me.它对我来说效果很好。 R opened this newly created csv file correctly, all data was separated at columns right. R 正确打开了这个新创建的 csv 文件,所有数据都在右侧的列中分开。

So i made the code:所以我做了代码:

x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";", row.names = NULL) And it looks liked it worked.... But now the data is in the wrong columns (for example, the "name" column contains now the "time" value, and the "time" column contains the "costs" value. x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";", row.names = NULL) 看起来很有效....但现在数据在错误的列中(例如,“名称”列现在包含“时间”值,而“时间”列包含“成本”值。

Does anybody know how to fix this?有人知道如何解决这个问题吗? I can rename columns but i think that is not the best way.我可以重命名列,但我认为这不是最好的方法。

I had the exact same issue.我有完全相同的问题。 Did quite some research and found out, that the CSV was ill-formed .做了很多研究,发现CSV 格式错误

In the header line of the CSV there were all the labels (separated by the separator) and then a line break.在 CSV 的 header 行中有所有标签(由分隔符分隔),然后是换行符。 Starting with line 2, there was an additional separator at the end of each line.从第 2 行开始,每行末尾都有一个额外的分隔符。 So an example of such an ill-formed CSV file looks like this:因此,这种格式错误的 CSV 文件的示例如下所示:

Field1;Field2   <-- see the *missing* semicolon at the end
12;23;          <-- see the *trailing* semicolon in each of the data lines
34;67;
45;56;

Such ill-formatted files are even harder to spot for TAB-separated files.对于以 TAB 分隔的文件,这种格式错误的文件更难被发现。

Excel does not care about that, when importing CSV files. Excel 在导入 CSV 文件时不关心这一点。 But R does care.但 R 确实关心。

When you use skip=1 you skip the header line that contains part of the mismatch.当您使用skip=1时,您将跳过包含部分不匹配的 header 行。 The data frame will be imported well, but there will be a column of "NA" at the end of each row.数据框会被很好地导入,但是每一行的末尾都会有一列“NA”。 And obviously you will not have column names, as these were skipped.显然你不会有列名,因为这些被跳过了。

Easiest solution: edit the CSV file and either add an additional separator at the end of the header line as well, or remove the trailing delimiters in the data lines.最简单的解决方案:编辑 CSV 文件并在header行的末尾添加一个额外的分隔符,或者删除数据行中的尾随分隔符。 You can also use generic read and write functions in R for text files to automate that editing.您还可以使用 R 中的通用读写函数对文本文件进行自动编辑。

This problem can arise due to regional settings on the excel application where the .csv file was created.由于创建 .csv 文件的 Excel 应用程序的区域设置,可能会出现此问题。

While in most places a "," separates the columns in a COMMA separated file (which makes sense), in other places it is a ";"虽然在大多数地方用“,”分隔逗号分隔文件中的列(这是有道理的),但在其他地方用“;”

Depending on your regional settings, you can experiment with:根据您的区域设置,您可以尝试:

x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=",") #used in North America

or,或者,

   x1 <- read.csv(file = "1energy.csv", head = TRUE, sep=";") #used in some parts of Asia and Europe 

You can transform the data by arranging the data into many cells corresponding to columns.您可以通过将数据排列到与列对应的多个单元格中来转换数据。

1.Open your csv file 2.copy the content and paste it into txt file save and copy its content 1.打开你的csv文件2.复制内容并将其粘贴到txt文件中保存并复制其内容

3.open new excell file 4.in excell go to the section responsible for data . 3.打开新的excel文件 4.在excel中转​​到负责数据的部分。 it is acually called "Data" 5.then on the left side go to external data query , in german "externe Daten abfragen" 6.go ahead step by step and seperate by commas 7. save your file as csv它被称为“数据” 5.然后在左侧转到外部数据查询,在德语中为“externe Daten abfragen” 6.逐步前进并用逗号分隔 7.将您的文件另存为 csv

Open your file in text edit and see if it really is separated with commas... Sometimes .csv files are separated with tabs instead of commas or semicolon and when opening in excel it has no problem but in R you have to specify the separator like this:在文本编辑中打开你的文件,看看它是否真的用逗号分隔......有时.csv文件用制表符而不是逗号或分号分隔,在excel中打开它没有问题,但在R中你必须指定分隔符这个:

x <- read.csv(file = "1energy.csv", head = TRUE, sep="\t")

I once had the same problem, this was my solution.我曾经遇到过同样的问题,这是我的解决方案。 Hope it works for you.希望对你有效。

You could use -你可以使用 -

df <- read.csv("filename.csv", sep = ";", quote = "")

It solved one my problems similar to yours.它解决了我与您类似的问题。

I had the same problem and it was frustrating...我遇到了同样的问题,这令人沮丧......

However, I found the ultimate solution First take this (csv file) and then convert it online to Json file and download it ... then redo the whole thing backwards (re-convert Jason to csv) online... download the converted file... give it a name...但是,我找到了最终的解决方案首先获取此(csv 文件),然后将其在线转换为 Json 文件并下载...然后将整个过程向后重做(将 Jason 重新转换为 csv)在线...下载转换后的文件……给它起个名字……

then put it on your Rstudio然后把它放在你的 Rstudio 上

file name <- read.csv(file='name your file.csv') ... took me 4 days to think out of the box... 🙂🙂🙂文件名 <- read.csv(file='name your file.csv') ... 我花了 4 天时间想开箱即用 ... 🙂🙂🙂

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM