简体   繁体   English

如何根据R中的索引更改data.frame中的列?

[英]How to change a column inside a data.frame based on a index in R?

I have a data.frame with two columns: Name and Index and 2 million rows. 我有一个包含两列的data.frame:Name和Index以及200万行。

I am sure that all index were written correctly, but I need to verify the 'Name' column. 我确信所有索引都写得正确,但我需要验证“名称”列。

How do I do to change all Name values based on Index values. 如何根据索引值更改所有名称值。

Let me give an example. 让我举个例子吧。 Suppose we have the following data.frame 'db': 假设我们有以下data.frame'db':

db
Index Name
1      Carlos
2      John
3      Bill
4      Mary
1      Cerlas

As it is shown, 'Name' should be equal for every Index value, but someone write it incorrectly. 如图所示,每个Index值的'Name'应相等,但有人写错了。

How would I correct it? 我该如何纠正? Is there a solution employing 'dplyr' or 'tidyr'? 是否有采用'dplyr'或'tidyr'的解决方案?

I tried the following code, but it has not worked. 我尝试了以下代码,但它没有奏效。

for (i in unique(db$Index)) {
    db$Nome[db$Index==i] <- db$Nome[db$Index==i][1]
}

Thanks 谢谢

If somebody named it correctly for the first element in 'Name' for each 'Index', we can use data.table to assign the 'Name' as the first element of 'Name' grouped by 'Index'. 如果有人为每个'Index'的'Name'中的第一个元素正确命名它,我们可以使用data.table将'Name'指定为'Name'的第一个元素,按'Index'分组。

library(data.table)
setDT(db)[, Name:= Name[1L], by = Index]

You could do it in dplyr. 你可以在dplyr中做到这一点。 Here we are making the first name given for each Index the name for the entire index: 在这里,我们为每个索引指定整个索引名称的第一个名称:

library(dplyr)
dat %>% group_by(Index) %>%
        mutate(Name = Name[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM