简体   繁体   English

根据另一个数据框中的值有条件地替换数据框中的列名

[英]Conditionally replace column names in a dataframe based on values in another dataframe

I have downloaded a table of stream diversion data ("df_download").我已经下载了一个流转移数据表(“df_download”)。 The column names of this table are primarily taken from the ID numbers of the gauging stations.该表的列名主要取自测站的 ID 号。

I want to conditionally replace the ID numbers that have been used for column names with text for the station names , which will help make the data more readable when I'm sharing the results.我想有条件地将用于列名的 ID 号替换为站名的文本,这将有助于在我共享结果时使数据更具可读性。 I created a table ("stationIDs") with the ID numbers and station names to use as a reference for changing the column names of "df_download".我创建了一个包含 ID 号和站名的表(“stationIDs”),用作更改“df_download”列名的参考。

I can replace the column names individually, but I want to write a loop of some kind that will address all of the columns of "df_download" and change the names of the columns referenced in the dataframe "stationIDs".我可以单独替换列名,但我想编写某种循环来处理“df_download”的所有列并更改数据帧“stationIDs”中引用的列的名称。

An example of what I'm trying to do is below.我正在尝试做的一个例子如下。

Downloaded Data ("df_download")下载的数据(“df_download”)

A portion of the downloaded data is similar to this:部分下载的数据类似于:

df_downloaded <- data.frame(Var1 = seq(as.Date("2012-01-01"),as.Date("2012-12-01"), by="month"),
                            Var2 = sample(50:150,12, replace =TRUE),
                            Var3 = sample(10:100,12, replace =TRUE),
                            Var4 = sample(15:45,12, replace =TRUE),
                            Var5 = sample(50:200,12, replace =TRUE),
                            Var6 = sample(15:100,12, replace =TRUE),
                            Var7 = c(rep(0,3),rep(13,6),rep(0,3)),
                            Var8 = rep(5,12))
colnames(df_downloaded) <- c("Diversion.Date","360410059","360410060",
                             "360410209","361000655","361000656","Irrigation","Seep") 

df_download # not run
# 
#    Diversion.Date 360410059 360410060 360410209 361000655 361000656 Irrigation Seep
# 1      2012-01-01        93        57        28       101        16          0    5
# 2      2012-02-01       102        68        19       124        98          0    5
# 3      2012-03-01       124        93        36       109        56          0    5
# 4      2012-04-01        94        96        23        54        87         13    5
# 5      2012-05-01        83        70        43       119        15         13    5
# 6      2012-06-01        78        63        45       195        15         13    5
# 7      2012-07-01        86        77        20       130        63         13    5
# 8      2012-08-01       118        29        27       118        57         13    5
# 9      2012-09-01       142        18        45       116        27         13    5
# 10     2012-10-01        74        68        34       182        79          0    5
# 11     2012-11-01       106        48        27        95        74          0    5
# 12     2012-12-01        91        41        20       179        55          0    5

Reference Table ("stationIDs")参考表(“stationIDs”)

stationIDs <- data.frame(ID = c("360410059", "360410060", "360410209", "361000655", "361000656"),
                         Names = c("RimView", "IPCO", "WMA.Ditch", "RV.Bypass", "LowerFalls"))
stationIDs # not run
#
#          ID      Names
# 1 360410059    RimView
# 2 360410060       IPCO
# 3 360410209  WMA.Ditch
# 4 361000655  RV.Bypass
# 5 361000656 LowerFalls

I can replace the column names in "df_downloaded" using individual statements.我可以使用单独的语句替换“df_downloaded”中的列名。 I show the first three iterations below.我在下面展示了前三个迭代。
After three iterations "RimValley", "IPCO", and "WMA.Ditch" have replaced their respective gauge ID numbers.经过三次迭代,“RimValley”、“IPCO”和“WMA.Ditch”已经替换了各自的仪表 ID 号。

names(df_downloaded) <- gsub(stationIDs$ID[1],stationIDs$Name[1],names(df_downloaded))

# head(df_downloaded)
#   Diversion.Date RimView 360410060 360410209 361000655 361000656 Irrigation Seep
# 1     2012-01-01      93        57        28       101        16          0    5
# 2     2012-02-01     102        68        19       124        98          0    5
# 3     2012-03-01     124        93        36       109        56          0    5
# 4     2012-04-01      94        96        23        54        87         13    5
# 5     2012-05-01      83        70        43       119        15         13    5
# 6     2012-06-01      78        63        45       195        15         13    5

names(df_downloaded) <- gsub(stationIDs$ID[2],stationIDs$Name[2],names(df_downloaded))

# head(df_downloaded)
#   Diversion.Date RimView IPCO 360410209 361000655 361000656 Irrigation Seep
# 1     2012-01-01      93   57        28       101        16          0    5
# 2     2012-02-01     102   68        19       124        98          0    5
# 3     2012-03-01     124   93        36       109        56          0    5
# 4     2012-04-01      94   96        23        54        87         13    5
# 5     2012-05-01      83   70        43       119        15         13    5
# 6     2012-06-01      78   63        45       195        15         13    5

names(df_downloaded) <- gsub(stationIDs$ID[3],stationIDs$Name[3],names(df_downloaded))

# head(df_downloaded)
#   Diversion.Date RimView IPCO WMA.Ditch 361000655 361000656 Irrigation Seep
# 1     2012-01-01      93   57        28       101        16          0    5
# 2     2012-02-01     102   68        19       124        98          0    5
# 3     2012-03-01     124   93        36       109        56          0    5
# 4     2012-04-01      94   96        23        54        87         13    5
# 5     2012-05-01      83   70        43       119        15         13    5
# 6     2012-06-01      78   63        45       195        15         13    5

If I try to do the renaming using a for loop, I end up with NAs for column names.如果我尝试使用for循环进行重命名,我最终会使用 NA 作为列名。

for(i in seq_along(names(df_downloaded))){
    names(df_downloaded) <- gsub(stationIDs$ID[i],stationIDs$Name[i],names(df_downloaded))
}

# head(df_downloaded)
#           NA  NA NA NA  NA NA NA NA
# 1 2012-01-01  93 57 28 101 16  0  5
# 2 2012-02-01 102 68 19 124 98  0  5
# 3 2012-03-01 124 93 36 109 56  0  5
# 4 2012-04-01  94 96 23  54 87 13  5
# 5 2012-05-01  83 70 43 119 15 13  5
# 6 2012-06-01  78 63 45 195 15 13  5

I really want to be able to change the names with a for loop or something similar, because because the number of stations that I download data from changes depending on the years that I am analyzing.我真的希望能够使用for循环或类似的东西更改名称,因为我下载数据的站点数量会根据我分析的年份而变化。

Thanks for taking time to look at my question.感谢您花时间看我的问题。

We can use match我们可以使用match

#Convert factor columns to character
stationIDs[] <- lapply(stationIDs, as.character)
#Match names of df_downloaded with stationIDs$ID
inds <- match(names(df_downloaded), stationIDs$ID)
#Replace the matched name with corresponding Names from stationIDs
names(df_downloaded)[which(!is.na(inds))] <- stationIDs$Names[inds[!is.na(inds)]]

df_downloaded
#   Diversion.Date RimView IPCO WMA.Ditch RV.Bypass LowerFalls Irrigation Seep
#1      2012-01-01     142   14        41       200         79          0    5
#2      2012-02-01      97  100        35       176         22          0    5
#3      2012-03-01      85   59        26        88         71          0    5
#4      2012-04-01      68   49        34        63         15         13    5
#5      2012-05-01      62   58        44        87         16         13    5
#6      2012-06-01      70   59        33       145         87         13    5
#7      2012-07-01     112   65        25        52         64         13    5
#8      2012-08-01      75   12        27       103         19         13    5
#9      2012-09-01      73   65        36       172         68         13    5
#10     2012-10-01      87   35        27       146         42          0    5
#11     2012-11-01     122   17        33       183         32          0    5
#12     2012-12-01     108   65        15       120         99          0    5

You can do this dplyr and tidyr.您可以执行此 dplyr 和 tidyr。 You basically want to make your data long so that the IDs are in a column so that you can do a join on this with your reference of IDs to names.您基本上希望使您的数据很长,以便 ID 位于一列中,以便您可以使用 ID 对名称的引用对此进行连接。 Then you can make your data wide again.然后,您可以再次使数据变宽。

df_downloaded %>%
   gather(ID, value, -Diversion.Date, -Irrigation, -Seep) %>% 
   left_join(., stationIDs) %>%
   dplyr::select(-ID) %>% 
   spread(Names, value)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 根据另一个数据框中的列替换列值 - Replace column values based on column in another dataframe 根据R中另一个数据框的值有条件地将1或0分配给新列 - Assigning 1 or 0 conditionally to a new column based on values from another dataframe in R 使用Tidyverse根据另一个数据框列中的值有条件地替换数据框列中的值 - Conditionally Replacing Values in a Dataframe Column Based on Values in Another Dataframe Column Using Tidyverse 如何根据另一个 dataframe 的列中的值对列名进行排序? - How to sort column names based on values in a column of another dataframe? 根据输入为 dataframe 的另一列替换列的值 - Replace values of a column based on another column having as input a dataframe 根据另一个列表替换数据框中的所有名称 - replace all names in a dataframe based on another list 根据 R 中的另一列 dataframe 替换一列中的值 - Replace values in one column based on another dataframe in R 如果列名称与一列的值与另一个数据帧的列值匹配,如何替换列名称 - How to replace column names if it matches with the values of one column with the column values of another dataframe 如何消除另一个数据框中基于列名的数据框中的行 - How to eliminate rows in a dataframe based column names in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM