简体   繁体   English

在R中更改数据帧

[英]Altering a data frame in R

I have a data frame that has the first column go from 1 to 365 like this 我有一个数据框,其第一列从1到365像这样

c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2...

and the second column has times that repeat over and over again like this 第二列的时间像这样重复一遍又一遍

c(0,30,130,200,230,300,330,400,430,500,0,30,130,200,230,300,330,400,430,500...

so for every 1 value in the first column I have a corresponding time in the second column then when I get to the 2's the times start over and each 2 has a corresponding time, 因此,对于第一列中的每个1值,我在第二列中都有一个对应的时间,然后当我到达2时,时间重新开始,每个2都有一个对应的时间,

occasionally I will come across 偶尔我会遇到

c(3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4...

c(0,30,130,200,230,330,400,430,500,0,30,130,200,230,300,330,400,430,500...

Here one of the 3's is missing and the corresponding time of 300 is missing with it. 这里缺少3的其中之一,并且缺少300的相应时间。

How can I go through my entire data frame and add these missing values? 如何遍历整个数据框并添加这些缺失的值? I need a way for R to go through and identify any missing values then insert a row and put the appropriate value, 1 to 365, in column one and the appropriate time with it. 我需要一种方法让R通过并找出所有缺失的值,然后插入一行并将适当的值1到365放入第一列,并在适当的时间放置它。 So for the given example R would add a row in between 230 and 330 and then place a 3 in the first column and 300 in the second. 因此,对于给定的示例,R会在230和330之间添加一行,然后在第一列中放置3,在第二列中放置300。 There are parts of the column that are missing several consecutive values. 该列的某些部分缺少几个连续的值。 It is not just one here and there 不只是这里和那里的一个

EDIT: Solution with all 10 times clearly specified in advance and code tidy up/commenting 编辑:预先明确指定所有10次的解决方案,并进行代码整理/注释

You need to create another data.frame containing every possible row and then merge it with your data.frame . 您需要创建另一个包含每个可能行的data.frame ,然后merge其与data.frame merge The key aspect is the all.x = TRUE in the final merge which forces the gaps in your data to be highlighted. 关键方面是最终合并中的all.x = TRUE ,这会强制突出显示数据中的差距。 I simulated the gaps by sampling only 15 of the first 20 possible day/time combinations in your.dat 我通过在your.datyour.dat前20种可能的日/时组合中的15种进行了采样来模拟差距

# create vectors for the days and times
the.days    = 1:365
the.times   = c(0,30,100,130,200,230,330,400,430,500)   # the 10 times to repeat

# create a master data.frame with all the times repeated for each day, taking only the first 20 observations
dat.all = data.frame(x1=rep(the.days, each=10), x2 = rep(the.times,times = 365))[1:20,]

# mimic your data.frame with some gaps in it (only 15 of 20 observations are present)
your.sample = sample(1:20, 15)
your.dat = data.frame(x1=rep(the.days, each=10), x2 = rep(the.times,times = 365), x3 = rnorm(365*10))[your.sample,]

# left outer join merge to include ALL of the master set and all of your matching subset, filling blanks with NA
merge(dat.all, your.dat, all.x = TRUE)

Here is the output from the merge, showing all 20 possible records with the gaps clearly visible as NA : 这是合并的输出,显示了所有20条可能的记录,空白清晰可见为NA

   x1  x2          x3
1   1   0          NA
2   1  30  1.23128294
3   1 100  0.95806838
4   1 130  2.27075361
5   1 200  0.45347199
6   1 230 -1.61945983
7   1 330          NA
8   1 400 -0.98702883
9   1 430          NA
10  1 500  0.09342522
11  2   0  0.44340164
12  2  30  0.61114408
13  2 100  0.94592127
14  2 130  0.48916825
15  2 200  0.48850478
16  2 230          NA
17  2 330  0.52789171
18  2 400 -0.16939587
19  2 430  0.20961745
20  2 500          NA

Here are a few NA handling functions that could help you getting started. 这里有一些NA处理功能可以帮助您入门。 For the inserting task, you should provide your own data using dput or a reproducible example. 对于插入任务,您应该使用dput或可复制的示例提供自己的数据。

df <- data.frame(x = sample(c(1, 2, 3, 4), 100, replace = T), 
                 y = sample(c(0,30,130,200,230,300,330,400,430,500), 100, replace = T))

nas <- sample(NA, 20, replace = T)
df[1:20, 1] <- nas
df$y <- ifelse(df$y == 0, NA, df$y)

# Columns x and y have NA's in diferent places.

# Logical test for NA
is.na(df)

# Keep not NA cases of one colum
df[!is.na(df$x),]
df[!is.na(df$y),]

# Returns complete cases on both rows
df[complete.cases(df),]

# Gives the cases that are incomplete.
df[!complete.cases(df),]

# Returns the cases without NAs
na.omit(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM