[英]How do I get rid of multiple columns with the same name in R?
I'm gathering SAT scores by school districts in Texas and their amount of education spending.我正在收集德克萨斯州各学区的 SAT 分数及其教育支出金额。 The data for SAT scores come in csv files that are split by year.
SAT 分数的数据来自按年份拆分的 csv 文件。 I want to consolidate the scores into my dataframe that has the amount of education spending without creating multiple columns for Total, Math score, Reading score, etc.
我想将分数合并到我的数据框中,该数据框具有教育支出金额,而无需为总计、数学分数、阅读分数等创建多个列。
I've tried the different types of join functions, semi_join
, full_join
, left_join
, etc. but none of these seems to address the issue I am having.我尝试了不同类型的连接函数,
semi_join
、 full_join
、 left_join
等,但这些似乎都无法解决我遇到的问题。
temp1<-left_join(temp, sat17, by= c("District","year"))%>%
left_join(., sat16, by=c("District","year"))%>%
left_join(., sat15, by=c("District","year"))%>%
left_join(., sat14, by=c("District","year"))%>%
left_join(., sat13, by=c("District","year"))%>%
left_join(., sat12, by=c("District","year"))%>%
left_join(., sat11, by=c("District","year"))
The output gives me columns Math.x, Math.y, Total.x, Total.y, and so on for each joined dataframe.输出为每个连接的数据帧提供 Math.x、Math.y、Total.x、Total.y 等列。 Also, sat17 includes a column called ERW, instead of Reading because the test changed that year.
此外,sat17 包括一个名为 ERW 的专栏,而不是 Reading,因为那一年的测试发生了变化。 I want to keep ERW separate, and the rest of the Reading, Math, and Total scores to line up under one of each column.
我想将 ERW 分开,其余的阅读、数学和总分排在每一列的下面。
I think that what you want to do is to bind them together... that is to "add" them up one on the top of the other.我认为你想要做的是将它们绑定在一起......也就是说将它们“添加”到另一个之上。
Try:尝试:
do.call(rbind, dfs) # dfs is the list of dataframes
or using purrr
或使用
purrr
library(purrr)
bind_rows(dfs, .id = NULL)
Or say you want to just bind them at the.csv level to begin with, just throw all your files into a subdirectory called "data".或者说您只想将它们绑定到 .csv 级别开始,只需将所有文件放入名为“数据”的子目录中即可。 You can try something like this:
你可以尝试这样的事情:
setwd("./data/")
library(purrr)
library(tidyverse)
binded_data <- tibble(filenames = list.files()) %>%
mutate(yearly_sat = map(filenames, read_csv)) %>%
unnest()
dplyr
is automatically going to rename any columns that you don't join by and have a matching column name in the joined data set. dplyr
会自动重命名您没有加入的任何列,并且在加入的数据集中具有匹配的列名。
In your case, since you only want to join by=c("District", "year")
, any other columns that have the same name will get renamed.在您的情况下,由于您只想加入
by=c("District", "year")
,因此任何其他具有相同名称的列都将被重命名。
The starting data set columns getting .x
appended to the end of their name, while the columns being left joined get .y
appended to the end of their name.起始数据集的列将
.x
附加到其名称的末尾,而左连接的列将.y
附加到其名称的末尾。
If you want to have Math, Reading, and Total all in the same column, then you need to stack the data sets in top of each other with dplyr::bind_rows()
如果您想将数学、阅读和总计全部放在同一列中,则需要使用
dplyr::bind_rows()
将数据集堆叠在一起
combined_sat <- dplyr::bind_rows(sat17, sat16, sat15, sat14, sat13, sat12, sat11)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.