简体   繁体   中英

Subset data frame based on range of values in second data frame

I am trying to create a subset of a data frame based on a range surrounding the values of a second data frame, I've been researching but I just cannot figure out how to go about it. I've used dummy data here as they are both large datasets with many columns.

Data Frame 1 (df1) has 50 columns, thousands of recordings at different Latitudes

Recording Latitude
BombusL 51.41
ApisM 51.67
BombusR 51.34

Data Frame 2 (df2) has several hundred towns all at different latitudes, it is significantly smaller than df1

Town Lat
Bristol 51.40
Merton 51.42
Horsham 51.33

I need a subset of df1 which only includes rows with latitudes that are within 0.01 of a latitude in df2. So the code needs to look down every row of df1 and test that number against every row of df2. The output would include only rows from df1 where the latitude value is within 0.01 range of a value in df2$Latitude.

From the example, the following lines would be included

Recording Latitude
BombusL 51.41
BombusR 51.34

I have the start of the code to do a filter that I could then run through the data frame to create the subset

LatFil <- df1$latitude %in% df2$latitude)

But I can't figure out how to enter the logical test of ± 0.01 of the value in df2$latitude

When there is precision involved (ie adding or subtracting 0.01, it is a floating point number), it may be better to use comparison operators instead of fixed matching

subset(df1, (Latitude >= (df2$Lat - 0.01)) & 
         (Latitude <= (df2$Lat + 0.01)))

Another option:

df2$Lat_hi <- df2$Lat + 0.01
df2$Lat_lo <- df2$Lat - 0.01


LatFil <- df1[df1$Latitude %in% c(df2$Lat, df2$Lat_hi, df2$Lat_lo),]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM