![](/img/trans.png)
[英]How to change a column value, based on a combination of values from two other columns in R?
[英]Change the value of a column in R based on values of two other columns
我正在嘗試根據另外兩列的值更改一列的值。 到目前為止,這讓我有點頭疼,我不確定這是否可能。
我的數據集看起來像這樣。 一列是時間,另外兩列反映子代父代關系。 在時間點 1 等奇怪的情況下,我的后代“D”第一次出現在數據集中,並且在上一個時間點還沒有同時充當后代和父親時間。
數據
structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L),
offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"
), parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame", row.names = c(NA,
-9L))
我想要幫助的是
找到存在於一個時間點但不存在於前一個時間點的所有后代(不考慮時間點 0),並像 D 和 F 一樣作為后代和父親
當我找到它們時,我想將一個確切的時間點減少 0.5
time offspring parent
0 A NA
0 B NA
0 C NA
1 A A
0.5 D B
1 E D
2 A A
1.5 F A
2 G F
對此問題的任何幫助或指導將不勝感激。
創建 2 個數據框,查找每只動物作為父母和后代的第一次出現。
找出兩個組合列中出現的時間和動物,然后更新原始 dataframe 中的時間。
df <-structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L),
offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"),
parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame",
row.names = c(NA, -9L))
library(dplyr)
#find the row where each Letter First appears as both a parent and offspring
parents <-df %>% filter(complete.cases(.)) %>% group_by(parent) %>% slice(1) %>% select(time, parent)
offsprings <- df %>% group_by(offspring) %>% slice(1) %>% select(time, offspring)
combined <- full_join(offsprings, parents)
#rows where the names match for both parent and offspring
matchingrows <-which(combined$parent == combined$offspring)
#update the times
for (i in matchingrows){
row = which(df$time == combined$time[i] & df$offspring == combined$offspring[i])
df$time[row] <- df$time[row] - 0.5
}
df
如果您願意,請在 data.table 中:
library(data.table)
DT <- data.table(time = c(0,0,0,1,1,1,2,2,2),
offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))
for (i in seq_len(nrow(DT))) {
DT[i, time := fifelse(time != 0 & offspring %chin% DT[, parent] & !(offspring %chin% DT[seq_len(i-1), offspring]),
time - 0.5,
time)]
}
> DT
time offspring parent
1: 0.0 A <NA>
2: 0.0 B <NA>
3: 0.0 C <NA>
4: 1.0 A A
5: 0.5 D B
6: 1.0 E D
7: 2.0 A A
8: 1.5 F A
9: 2.0 G F
與 dplyr:
library(dplyr)
library(tibble)
tbl <- tibble(time = c(0,0,0,1,1,1,2,2,2),
offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))
for (i in seq_len(nrow(tbl))) {
tbl[i,][['time']] <- tbl[i, ] %>% mutate(time = if_else(time != 0 &
offspring %in% tbl[['parent']] &
!(offspring %in% tbl[seq_len(i-1),][['offspring']]),
time - 0.5,
time)) %>% pull(time)
}
> tbl
# A tibble: 9 x 3
time offspring parent
<dbl> <chr> <chr>
1 0 A NA
2 0 B NA
3 0 C NA
4 1 A A
5 0.5 D B
6 1 E D
7 2 A A
8 1.5 F A
9 2 G F
我的解決方案可能不是最簡潔的,但我能夠使它工作並且它可以推廣到更大的數據集。 我確信有辦法改進這一點,所以我很想看看其他人想出什么。 首先,我遇到了 0 個下標的問題,所以我在最后減去的時間列中添加了 2 個。
這個想法是我遍歷行,我找到了當年(第 0 年之后)但不是前一年的后代的個體。 然后我檢查了哪些人在當年也是父母。 我將這些人在那個時間段內作為后代的行編譯成一個向量,因為我們稍后會刪除它們。 然后,我用 time-.5、那個后代和它的父母創建一個新行。 我將它們編譯成一個新的數據框,它將替換被刪除的行。
因為每個時間戳都有重復,所以我使要刪除的行向量和要添加的行的 df 唯一。 然后我對原始數據框進行刪除和添加,並讓數據類型一致。
parent_offspring <- data.frame(
"time" = c( rep(0,3), rep(1,3), rep(2,3)),
"offspring" = c("A","B","C","A","D","E","A","F","G"),
"parent" = c(NA, NA, NA, "A","B","D","A","A","F")
)
po<- parent_offspring
po$time <- po$time+2
delete_vec <- vector()
df_to_add <- data.frame()
for (i in seq_along(po$time)) {
q <- po$time[[i]] # Value of "Time" variable for the row
a <- which(po$time == q) # Rows sharing that value of "Time"
offspring_curr <- po$offspring[a] # Offspring at that time
b <- which(po$time==(q-1)) # Rows of offspring at Time-1
offspring_prev <- po$offspring[b] # Identities of offspring at Time-1
f<- offspring_curr[offspring_curr %in% offspring_prev == FALSE] # Which offspring at Time were not offspring at Time-1
if (length(f) == 0) {
next
} else { ##skip ahead if none of the offspring at Time were not offspring at Time - 1
parents_curr <- po$parent[which(po$time == q)] # Parents at current time
parent_and_offpsring_curr <- intersect(f,parents_curr) # Which individuals are both parents and offspring at the current time
if (length(parent_and_offpsring_curr) == 0) {
next
} else { ## skip ahead if no individuals are both parents and offspring
g<- which(po$time==q & po$offspring==parent_and_offpsring_curr) # which offspring row is occupied by an individual who is both a parent and offspring at the current time
delete_vec <- append(delete_vec,g) #we'll be deleting those rows in the end so we'll keep track of them and save them in a vector
h<- po$parent[g] # this is the parent for the offspring/parent individual in the current time.
add_row<-c((q-.5), parent_and_offpsring_curr, h) # make a new row with the fractional time, parent/offspring individual, and their parent for row when the parent/offspring individual is an offspring
df_to_add <- rbind(df_to_add,add_row) ## we'll add these rows at the end
}
}
}
delete_vec<-unique(delete_vec) ## iteration gave us duplicates
df_to_add <- unique(df_to_add) ## same as above
colnames(df_to_add) <- colnames(po) ## fix column names for new df
po<- po[-delete_vec,] ## remove the offspring rows for the parent/offspring individuals
po<-rbind(po,df_to_add) ## add the rows with fractional times
rownames(po) <- c(1:nrow(po)) ## fix the row numbers
po$time<- as.numeric(po$time) ## time was converted to character when put into a vector with letters
po$time <- po$time-2 ## back to the original time values
po
time offspring parent
1 0.0 A <NA>
2 0.0 B <NA>
3 0.0 C <NA>
4 1.0 A A
5 1.0 E D
6 2.0 A A
7 2.0 G F
8 0.5 D B
9 1.5 F A
然后,您可以使用 dplyr::arrange 按時間升序排列行
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.