简体   繁体   中英

Incrementally appending a list in r

I have the following problem: I have two dataframes: kl_df and IDlist

head(kl_df)
  STATIONS_ID MESS_DATUM QN_3 FX FM QN_4 RSK RSKF SDK SHK_TAG  NM VPM PM  TMK UPM TXK  TNK  TGK eor
1          73 2000-01-01   NA NA NA   10 2.9    7 0.0       6 8.0 5.6 NA -0.2  94 0.7 -1.7 -2.1 eor
2          73 2000-01-02   NA NA NA   10 0.0    0 1.6       5 7.3 6.2 NA  0.8  92 4.0 -1.4 -0.1 eor
3          73 2000-01-03   NA NA NA   10 0.0    0 0.0       0 8.0 5.7 NA -0.2  95 0.6 -1.3 -1.5 eor
4          73 2000-01-04   NA NA NA   10 0.8    8 0.8       0 7.7 5.9 NA  1.2  89 2.6 -0.4 -1.0 eor
5          73 2000-01-05   NA NA NA   10 0.0    0 1.1       0 5.7 6.6 NA  1.4  93 6.1 -0.7  0.0 eor
6          73 2000-01-06   NA NA NA   10 0.0    0 0.0       0 8.0 6.0 NA  0.1  98 1.4 -1.0 -1.0 eor

head(IDlist)
    Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge            Stationsname Bundesland    res
194          15  19510101  20190331           390   49.2346   10.9668                Abenberg     Bayern annual
306          29  19510101  20190527           260   49.7175   10.9101 Adelsdorf (Klaeranlage)     Bayern  daily
485          46  19370101  20190528           325   48.9450   12.4639                Aholfing     Bayern annual
606          55  19370101  20190528           509   47.8780   12.0239   Aibling, Bad-Ellmosen     Bayern annual
684          63  19510101  20190527           747   47.8172   10.5374                 Aitrang     Bayern  daily
857          73  19080101  20190528           340   48.6159   13.0506    Aldersbach-Kriestorf     Bayern annual
            var        per hasfile
194 more_precip historical    TRUE
306 more_precip historical    TRUE
485 more_precip historical    TRUE
606 more_precip historical    TRUE
684 more_precip historical    TRUE
857 more_precip historical    TRUE

IDlist contains unique rows regarding the stations_id, while duplicates are in kl_df. Now my goal is to append the variables "Stationshoehe", "geoBreite", "geoLaenge" for the correct station IDs to kl_df.

I tried to write a function. The idea of this function is to iterate through kl_df and for each iteration, go through IDlist$Stations_id in order to match the ID number. Afterwards the required variables are written into a list:

getcoords = function(){
 results=list()
 for (ID in kl_df$STATIONS_ID)  {
  counter = 1
  for (i in IDlist$Stations_id){
   if (ID == i) {
     print(counter)
     append(results, values= c(IDlist$Stationshoehe[counter], IDlist$geoBreite[counter], IDlist$geoLaenge[counter]))
     next
   }
   else {
    counter = counter+1
    print(counter)
   }
  }
 }
 return(results)
}
datlist = getcoords()

But it only returns an empty list... The print(counter) line is for testing purposes ontly. Problem is the counter always counts from 1 to length(IDlist$Stations_id). example of the print:

[1] 538
[1] 539
[1] 540
[1] 541
[1] 542
[1] 543
[1] 544
[1] 545
[1] 546
[1] 547
[1] 548
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Question: How to fix the function or is there a better way to accomplish the goal? Thank you very much!

What about writing:

results=append(results, values= c(IDlist$Stationshoehe[counter], IDlist$geoBreite[counter], IDlist$geoLaenge[counter]))`

In the if (ID == i) block.

R will never modify an argument passed to a function, append will return the list with the new element added, but you need to store it somewhere (see https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/append ).

When you say: Problem is the counter always counts from 1 to length(IDlist$Stations_id), this is the expected behavior of the code. If you want to stop as soon as you found a matching IDlist$Stations_id , change your next (which here has no effect since the else won't be executed in this case) to a break .

If I have understood the question correctly, you want to do a "left join" of dataframes kl_df with IDlist by column STATIONS_ID , and then select the columns of interest from the joined data frame.

Below, I have created a simpler version of your two dataframes - after the left join you should be able to tweak the select statement to keep only the columns of interest in the joined dataframe.

> kl_df <- data.frame(STATIONS_ID=c(1,1,2,2), col_a=c(1,2,3,4), col_b=c(10,15,12,8))
> kl_df
  STATIONS_ID col_a col_b
1           1     1    10
2           1     2    15
3           2     3    12
4           2     4     8

> IDlist <- data.frame(Stations_id=c(1,2,3), col_c=c(10,20,10), col_d=c(99,97,90))
> IDlist
  Stations_id col_c col_d
1           1    10    99
2           2    20    97
3           3    10    90

Now for the left join using the dplyr package:

> library(dplyr)

> df <- left_join(kl_df, IDlist, by=c("STATIONS_ID"="Stations_id"))
> df <- df %>% select(c(STATIONS_ID,col_a,col_b,col_d))
> df
  STATIONS_ID col_a col_b col_d
1           1     1    10    99
2           1     2    15    99
3           2     3    12    97
4           2     4     8    97

Note that the join functions in the dplyr package are much faster than using merge or a for loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM