I have the following problem: I have two dataframes: kl_df and IDlist
head(kl_df)
STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG NM VPM PM TMK UPM TXK TNK TGK eor
1 73 2000-01-01 NA NA NA 10 2.9 7 0.0 6 8.0 5.6 NA -0.2 94 0.7 -1.7 -2.1 eor
2 73 2000-01-02 NA NA NA 10 0.0 0 1.6 5 7.3 6.2 NA 0.8 92 4.0 -1.4 -0.1 eor
3 73 2000-01-03 NA NA NA 10 0.0 0 0.0 0 8.0 5.7 NA -0.2 95 0.6 -1.3 -1.5 eor
4 73 2000-01-04 NA NA NA 10 0.8 8 0.8 0 7.7 5.9 NA 1.2 89 2.6 -0.4 -1.0 eor
5 73 2000-01-05 NA NA NA 10 0.0 0 1.1 0 5.7 6.6 NA 1.4 93 6.1 -0.7 0.0 eor
6 73 2000-01-06 NA NA NA 10 0.0 0 0.0 0 8.0 6.0 NA 0.1 98 1.4 -1.0 -1.0 eor
head(IDlist)
Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge Stationsname Bundesland res
194 15 19510101 20190331 390 49.2346 10.9668 Abenberg Bayern annual
306 29 19510101 20190527 260 49.7175 10.9101 Adelsdorf (Klaeranlage) Bayern daily
485 46 19370101 20190528 325 48.9450 12.4639 Aholfing Bayern annual
606 55 19370101 20190528 509 47.8780 12.0239 Aibling, Bad-Ellmosen Bayern annual
684 63 19510101 20190527 747 47.8172 10.5374 Aitrang Bayern daily
857 73 19080101 20190528 340 48.6159 13.0506 Aldersbach-Kriestorf Bayern annual
var per hasfile
194 more_precip historical TRUE
306 more_precip historical TRUE
485 more_precip historical TRUE
606 more_precip historical TRUE
684 more_precip historical TRUE
857 more_precip historical TRUE
IDlist contains unique rows regarding the stations_id, while duplicates are in kl_df. Now my goal is to append the variables "Stationshoehe", "geoBreite", "geoLaenge" for the correct station IDs to kl_df.
I tried to write a function. The idea of this function is to iterate through kl_df and for each iteration, go through IDlist$Stations_id in order to match the ID number. Afterwards the required variables are written into a list:
getcoords = function(){
results=list()
for (ID in kl_df$STATIONS_ID) {
counter = 1
for (i in IDlist$Stations_id){
if (ID == i) {
print(counter)
append(results, values= c(IDlist$Stationshoehe[counter], IDlist$geoBreite[counter], IDlist$geoLaenge[counter]))
next
}
else {
counter = counter+1
print(counter)
}
}
}
return(results)
}
datlist = getcoords()
But it only returns an empty list... The print(counter)
line is for testing purposes ontly. Problem is the counter always counts from 1 to length(IDlist$Stations_id). example of the print:
[1] 538
[1] 539
[1] 540
[1] 541
[1] 542
[1] 543
[1] 544
[1] 545
[1] 546
[1] 547
[1] 548
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
Question: How to fix the function or is there a better way to accomplish the goal? Thank you very much!
What about writing:
results=append(results, values= c(IDlist$Stationshoehe[counter], IDlist$geoBreite[counter], IDlist$geoLaenge[counter]))`
In the if (ID == i)
block.
R will never modify an argument passed to a function, append
will return the list with the new element added, but you need to store it somewhere (see https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/append ).
When you say: Problem is the counter always counts from 1 to length(IDlist$Stations_id), this is the expected behavior of the code. If you want to stop as soon as you found a matching IDlist$Stations_id
, change your next
(which here has no effect since the else
won't be executed in this case) to a break
.
If I have understood the question correctly, you want to do a "left join" of dataframes kl_df
with IDlist
by column STATIONS_ID
, and then select the columns of interest from the joined data frame.
Below, I have created a simpler version of your two dataframes - after the left join you should be able to tweak the select
statement to keep only the columns of interest in the joined dataframe.
> kl_df <- data.frame(STATIONS_ID=c(1,1,2,2), col_a=c(1,2,3,4), col_b=c(10,15,12,8))
> kl_df
STATIONS_ID col_a col_b
1 1 1 10
2 1 2 15
3 2 3 12
4 2 4 8
> IDlist <- data.frame(Stations_id=c(1,2,3), col_c=c(10,20,10), col_d=c(99,97,90))
> IDlist
Stations_id col_c col_d
1 1 10 99
2 2 20 97
3 3 10 90
Now for the left join using the dplyr
package:
> library(dplyr)
> df <- left_join(kl_df, IDlist, by=c("STATIONS_ID"="Stations_id"))
> df <- df %>% select(c(STATIONS_ID,col_a,col_b,col_d))
> df
STATIONS_ID col_a col_b col_d
1 1 1 10 99
2 1 2 15 99
3 2 3 12 97
4 2 4 8 97
Note that the join functions in the dplyr
package are much faster than using merge
or a for loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.