简体   繁体   English

在R中绘制大小不同的独立数据集

[英]Plotting independent data sets of different size in R

I want to read 3 independent data sets each of which having a different size and plot them using a boxplot, eg: 我想读取3个独立的数据集,每个数据集都有不同的大小,并使用箱线图进行绘制,例如:

Set1 Set2 Set3
1    1    1
1    2    2
1    2    2
     3    3
     3    3
          4

(As string: "Set1 Set2 Set3\\n1 1 1\\n1 2 2\\n1 2 2\\n 3 3\\n 3 3\\n 4\\n" ) (作为字符串: "Set1 Set2 Set3\\n1 1 1\\n1 2 2\\n1 2 2\\n 3 3\\n 3 3\\n 4\\n"

However, the column width could vary, eg when a value has more than 5 digits. 但是,列宽可能会发生变化,例如,当值的位数超过5位时。

When I do results = read.table("data.dat", header=TRUE) RStudio will report: 当我执行results = read.table("data.dat", header=TRUE) RStudio将报告:

line 4 did not have 3 elements

Using the option fill=TRUE would shift every field in line 4 to the left and fill the empty field ont the right with a NA which directly biases the data. 使用选项fill=TRUE会将第4行中的每个字段向左移动,并用一个NA直接填充数据,从而在右侧将空白字段填充。

Because of the fact that the column size may vary, I tried to load it as a CSV file but this had the effect that the median for Set1 became NA . 由于列大小可能会有所不同,因此我尝试将其加载为CSV文件,但这会导致Set1的中位数变为NA

Same data as CSV: 与CSV相同的数据:

Set1,Set2,Set3
1,1,1
1,2,2
1,2,2
,3,3
,3,3
,,4

So how can I plot all sets in a single diagram without the data being changed by R? 那么,如何在不通过R更改数据的情况下将所有集合绘制在单个图中?

EDIT1: Gave more details for the used data format. EDIT1:为使用的数据格式提供更多详细信息。 Also I emphasize that the column size might vary and not be as fixed in the example 我还强调列的大小可能会有所不同,并且在示例中不固定

This reads the indicated file with the indicated field widths. 这将读取具有指定字段宽度的指定文件。 The first (header line) is skipped and the indicated column names are used. 跳过第一行(标题行),并使用指示的列名。 Empty fields ( na.strings="" ) are regarded as NA : 空字段( na.strings="" )被视为NA

results <- read.fwf("data.dat", widths = c(5L, 5L, 5L), skip = 1, 
  na.strings = "", col.names = c("Set1", "Set2", "Set3"))

boxplot(results)

(continued after image) (下图续)

在此处输入图片说明

Note: One cannot tell the exact content of data.dat from the question and that could be crucial but for purposes of this answer we have assumed this: 注意:不能从问题中分辨出data.dat的确切内容,这可能很关键,但是出于这个答案的目的,我们假设:

Lines <- c("Set1 Set2 Set3", 
           "1    1    1", 
           "1    2    2", 
           "1    2    2", 
           "     3    3", 
           "     3    3", 
           "          4")
writeLines(Lines, "data.dat")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM