簡體   English   中英

基於R中的幾種條件的子集數據幀

[英]Subset dataframe based on several conditions in R

我正在嘗試在多個條件下對一個大數據框(北半球的氣旋軌道)進行子集化:下面的數據

centro <- read.table("https://forms.naturwissenschaften.ch/imilast/_ERAinterim_1.5_1979_MTEX/ERAinterim_1.5_NH_M02_19790101_20121231_MTEX.txt?_ga=2.18919096.1825595846.1546710263-1112023567.1546710263", sep="", fill = T, nrows = 500,
                 header = F, skip = 2) # read here only the first 500 rows

centro <- na.omit(centro)

colnames(centro) <- c("Code","CycloneNo","StepNo","DateI10","Year","Month","Day","Time","LongE","LatN","Intensity1","Intensity2","Intensity3")

當列StepNo == 1時,我只想對空間框中(如-4和40 E經度和32-45 N lat)形成的旋風器(基於唯一列CycloneNo)進行子集化。容易做到:

centro_subs <- centro[centro$LongE>=-4 & centro$LongE <= 40 & centro$LatN>= 32 & centro$LatN <= 45,]

但是,我只想保留在此框中形成的氣旋(當StepNo == 1時),而其余軌道也都保留在此框之外。

我試圖通過這樣做來做到這一點:

df_s <- centro[1,]
df_s[1,] <- NA # create an empty dataframe to be filled in the iteration


for (i in 1:length(unique(centro$CycloneNo))){
print(i)
a <- centro[centro$LongE[centro$StepNo==1]>= -4 & 
centro$LongE[centro$StepNo==1] <= 40 & 
centro$LatN[centro$StepNo==1]>= 32 & centro$LatN 
<=45[centro$StepNo==1],]
df_s <- rbind(a, df_s)
}

但是,這最終導致填充了NA的空數據幀。 我知道這很難在這里描述。 我覺得自己有點接近,但現在我也很疲憊,試圖找到新的方法。

我不認為您想要循環。 我敢肯定這不是最優雅的方法,但是我認為它是可行的。

step1s <- subset(centro, StepNo == 1) # only take step 1 of all cyclones
keeps <- step1s$CycloneNo[step1s$LongE>=-4 & step1s$LongE <= 40 & step1s$LatN>= 32 & step1s$LatN <= 45] # find cyclone numbers for cyclones meeting the condition
centro_sub <- centro[centro$CycloneNo %in% keeps, ] # keep all steps of cyclones meeting the conditions

約瑟夫斯提供了一個很好的答案。 或者,可以在data.table中使用它,這可能會以某種速度為代價提供更高的可讀性。

centro <- data.table(centro)
centro[CycloneNo %in% CycloneNo[StepNo == 1 & 
                                  LongE %between% c(-4,40) & 
                                  LatN %between% c(32,45)]]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM