[英]Issues with tidyverse
I run the following code few months back and it worked OK -几个月前我运行了以下代码,它工作正常 -
ceo1_nochange <- ceo1 %>%
group_by(ISIN, year) %>%
nest(.key = "OTHER_DATA") %>%
group_by(ISIN) %>%
mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1),
OTHER_DATA_LEAD = lead(OTHER_DATA, 1),
KEEP = pmap(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
isTRUE(all_equal(x["DirectorID"], y["DirectorID"])) ||
isTRUE(all_equal(y["DirectorID"], z["DirectorID"]))
})) %>%
filter(unlist(KEEP)) %>%
select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>%
unnest() %>%
ungroup()
My purpose was to identify those observations in which DirectorID
did not change from year to year.我的目的是确定那些
DirectorID
每年都没有变化的观察结果。
But now I got the following error -但现在我收到以下错误 -
Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Error: Problem with `mutate()` input `KEEP`.
x argument is of length zero
i Input `KEEP` is `pmap(...)`.
i The error occurred in group 1: ISIN = "AN8068571086".
Run `rlang::last_error()` to see where the error occurred.
Can anybody shed some light?任何人都可以解释一下吗?
This is a sample dataset -这是一个示例数据集 -
"ROW,ISIN,YEAR,DIRECTOR_NAME,DIRECTOR_ID
1,US9898171015,2006,Thomas (Tom) E Davin,2247441792
2,US9898171015,2006,Matthew (Matt) L Hyde,4842568996
3,US9898171015,2007,James (Jim) M Weber,3581636766
4,US9898171015,2007,Matthew (Matt) L Hyde,4842568996
5,US9898171015,2007,David (Dave) M DeMattei,759047198
6,US9898171015,2008,James (Jim) M Weber,3581636766
7,US9898171015,2008,Matthew (Matt) L Hyde,4842568996
8,US9898171015,2008,David (Dave) M DeMattei,759047198
9,US9898171015,2009,William (Bill) Milroy Barnum Jr,20462211719
10,US9898171015,2009,James (Jim) M Weber,3581636766
11,US9898171015,2009,Matthew (Matt) L Hyde,4842568996
12,US9898171015,2009,David (Dave) M DeMattei,759047198
13,US9898171015,2010,William (Bill) Milroy Barnum Jr,20462211719
14,US9898171015,2010,James (Jim) M Weber,3581636766
15,US9898171015,2010,Matthew (Matt) L Hyde,4842568996
16,US9898171015,2011,Sarah (Sally) Gaines McCoy,11434863691
17,US9898171015,2011,William (Bill) Milroy Barnum Jr,20462211719
18,US9898171015,2011,James (Jim) M Weber,3581636766
19,US9898171015,2011,Matthew (Matt) L Hyde,4842568996
20,US9898171015,2012,Sarah (Sally) Gaines McCoy,11434863691
21,US9898171015,2012,Ernest R Johnson,40425210975
22,US9898171015,2013,Sarah (Sally) Gaines McCoy,11434863691
23,US9898171015,2013,Ernest R Johnson,40425210975
24,US9898171015,2013,Travis D Smith,53006212569
25,US9898171015,2014,Sarah (Sally) Gaines McCoy,11434863691
26,US9898171015,2014,Ernest R Johnson,40425210975
27,US9898171015,2014,Travis D Smith,53006212569
28,US9898171015,2015,Kalen F Holmes,11051172801
29,US9898171015,2015,Sarah (Sally) Gaines McCoy,11434863691
30,US9898171015,2015,Ernest R Johnson,40425210975
31,US9898171015,2015,Travis D Smith,53006212569
32,US9898171015,2016,Sarah (Sally) Gaines McCoy,11434863691
33,US9898171015,2016,Ernest R Johnson,40425210975
34,US9898171015,2016,Travis D Smith,53006212569
35,US9898171015,2017,Sarah (Sally) Gaines McCoy,11434863691
36,US9898171015,2017,Scott Andrew Bailey,174000000000
37,US9898171015,2017,Ernest R Johnson,40425210975
38,US9898171015,2017,Travis D Smith,53006212569
"
can someone provide some clue?有人可以提供一些线索吗?
I didn't find anything in the code which might be affected due to any recent changes.我在代码中没有发现任何可能由于最近的更改而受到影响的内容。 The reason why you are getting the error is because of
lag
and lead
functions.您收到错误的原因是由于
lag
和lead
功能。 When you use them on dataframe it creates NULL
values at the beginning and end respectively.当您在 dataframe 上使用它们时,它会分别在开头和结尾创建
NULL
值。 If you put that check in pmap
statement it should work.如果您将该检查放入
pmap
语句中,它应该可以工作。
I did some other changes in the code as well -我还对代码进行了一些其他更改-
.key
has been deprecated in nest
so used nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)
instead. .key
在nest
中已被弃用,因此使用了nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)
。pmap_lgl
(instead of pmap
) so that you don't have to do unlist(KEEP)
in filter
.pmap_lgl
(而不是pmap
),这样您就不必在filter
中执行unlist(KEEP)
。unnest
needs an explicit mention of column name to unnest so used unnest(cols = c(OTHER_DATA))
. unnest
需要明确提及列名才能取消嵌套,因此使用unnest(cols = c(OTHER_DATA))
。library(tidyverse)
ceo1 %>%
group_by(ISIN, YEAR) %>%
nest(OTHER_DATA = c(ROW, DIRECTOR_NAME, DIRECTOR_ID)) %>%
group_by(ISIN) %>%
mutate(OTHER_DATA_LAG = lag(OTHER_DATA, 1),
OTHER_DATA_LEAD = lead(OTHER_DATA, 1),
KEEP = pmap_lgl(list(OTHER_DATA_LAG, OTHER_DATA, OTHER_DATA_LEAD), function(x, y, z) {
if(length(x) > 0 && length(y) > 0 && length(z) > 0)
isTRUE(all_equal(x["DIRECTOR_ID"], y["DIRECTOR_ID"])) ||
isTRUE(all_equal(y["DIRECTOR_ID"], z["DIRECTOR_ID"]))
else FALSE
})) %>%
filter(KEEP) %>%
select(-OTHER_DATA_LAG, -OTHER_DATA_LEAD, -KEEP) %>%
unnest(cols = c(OTHER_DATA)) %>%
ungroup()
# ISIN YEAR ROW DIRECTOR_NAME DIRECTOR_ID
# <chr> <int> <int> <chr> <dbl>
# 1 US9898171015 2007 3 James (Jim) M Weber 3581636766
# 2 US9898171015 2007 4 Matthew (Matt) L Hyde 4842568996
# 3 US9898171015 2007 5 David (Dave) M DeMattei 759047198
# 4 US9898171015 2008 6 James (Jim) M Weber 3581636766
# 5 US9898171015 2008 7 Matthew (Matt) L Hyde 4842568996
# 6 US9898171015 2008 8 David (Dave) M DeMattei 759047198
# 7 US9898171015 2013 22 Sarah (Sally) Gaines McCoy 11434863691
# 8 US9898171015 2013 23 Ernest R Johnson 40425210975
# 9 US9898171015 2013 24 Travis D Smith 53006212569
#10 US9898171015 2014 25 Sarah (Sally) Gaines McCoy 11434863691
#11 US9898171015 2014 26 Ernest R Johnson 40425210975
#12 US9898171015 2014 27 Travis D Smith 53006212569
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.