在R中自动查找和转换值

Question

I have a sample dataset with 45 rows and is given below. 我有一个包含45行的样本数据集，如下所示。

 itemid                    title release_date
16    573          Body Snatchers          1993
17    670          Body Snatchers          1993
41   1645        Butcher Boy, The          1998
42   1650        Butcher Boy, The          1998
1     218               Cape Fear          1991
18    673               Cape Fear          1962
27   1234   Chairman of the Board          1998
43   1654   Chairman of the Board          1998
2     246             Chasing Amy          1997
5     268             Chasing Amy          1997
11    309                Deceiver          1997
37   1606                Deceiver          1997
28   1256 Designated Mourner, The          1997
29   1257 Designated Mourner, The          1997
12    329      Desperate Measures          1998
13    348      Desperate Measures          1998
9     304           Fly Away Home          1996
15    500           Fly Away Home          1996
26   1175               Hugo Pool          1997
39   1617               Hugo Pool          1997
31   1395       Hurricane Streets          1998
38   1607       Hurricane Streets          1998
10    305          Ice Storm, The          1997
21    865          Ice Storm, The          1997
4     266      Kull the Conqueror          1997
19    680      Kull the Conqueror          1997
22    876             Money Talks          1997
24    881             Money Talks          1997
35   1477              Nightwatch          1997
40   1625              Nightwatch          1997
6     274                 Sabrina          1995
14    486                 Sabrina          1954
33   1442     Scarlet Letter, The          1995
36   1542     Scarlet Letter, The          1926
3     251         Shall We Dance?          1996
30   1286         Shall We Dance?          1937
32   1429           Sliding Doors          1998
45   1680           Sliding Doors          1998
20    711  Substance of Fire, The          1996
44   1658  Substance of Fire, The          1996
23    878          That Darn Cat!          1997
25   1003          That Darn Cat!          1997
34   1444          That Darn Cat!          1965
7     297             Ulee's Gold          1997
8     303             Ulee's Gold          1997

what I am trying to do is to convert the itemid based on the movie name and if the release date of the movie is same. 我想做的是根据电影名称以及电影的发行日期是否相同来转换itemid。 for example, The movie 'Ulee's Gold' has two item id's 297 & 303. I am trying to find a way to automate the process of checking the release date of the movie and if its same, itemid[2] of that movie should be replaced with itemid[1]. 例如，电影“ Ulee's Gold”的两个项目ID为297和303。我正在尝试寻找一种方法来自动检查电影的发行日期，如果相同，该电影的itemid [2]应该为替换为itemid [1]。 For the time being I have done it manually by extracting the itemid's into two vectors x & y and then changing them using vectorization. 目前，我已经通过将itemid提取为两个向量x和y，然后使用矢量化更改它们来手动完成了操作。 I want to know if there is a better way of getting this task done because there are only 18 movies with multiple id's but the dataset has a few hundred. 我想知道是否有更好的方法来完成此任务，因为只有18部具有多个ID的电影，而数据集却只有几百部。 Finding and processing this manually will be very time consuming. 手动查找和处理此过程非常耗时。

I am providing the code that I have used to get this task done. 我提供了用于完成此任务的代码。

x <- c(670,1650,1654,268,1606,1257,348,500,1617,1607,865,680,881,1625,1680,1658,1003,303)
y<- c(573,1645,1234,246,309,1256,329,304,1175,1395,305,266,876,1477,1429,711,878,297)


for(i in 1:18)
{
  df$itemid[x[i]] <- y[i]

}

Is there a better way to get this done? 有没有更好的方法来完成此任务？

Answer 1

I think you can do it in dplyr straightforwardly: 我认为您可以直接在dplyr执行此dplyr ：

Using your comment above, a brief example: 使用上面的评论，一个简单的示例：

itemid <- c(878,1003,1444,297,303)
title <- c(rep("That Darn Cat!", 3), rep("Ulee's Gold", 2))
year <- c(1997,1997,1965,1997,1997)

temp <- data.frame(itemid,title,year)
temp

library(dplyr)

temp %>% group_by(title,year) %>% mutate(itemid1 = min(itemid))

(I changed 'release_date' to 'year' for some reason... but this basically groups the title/year together, searches for the minimum itemid and the mutate creates a new variable with this lowest 'itemid'. （由于某种原因，我将'release_date'更改为'year'……但是这基本上将标题/年份分组在一起，搜索最小的itemid，然后mutate创建了一个具有最低的'itemid'的新变量。

which gives: 这使：

#  itemid          title year itemid1
#1    878 That Darn Cat! 1997     878
#2   1003 That Darn Cat! 1997     878
#3   1444 That Darn Cat! 1965    1444
#4    297    Ulee's Gold 1997     297
#5    303    Ulee's Gold 1997     297

在R中自动查找和转换值

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-01-28 20:01:06

在R中自动查找和转换值

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-01-28 20:01:06

解决方案1
0 已采纳 2015-01-28 20:01:06