[英]Build a tibble in R from various different counts
我有一個相當簡單的問題:如果你有一個原始數據集,然后你通過過濾數據集來計算值來回答一個問題:你如何構建一個數據框/你的答案的tibble?
#load the packages
library(easypackages)
packages("tidyverse","readxl","sf","tmaptools","tmap","lubridate",
"lwgeom","Cairo","nngeo","purrr","scales", "ggthemes","janitor")
polls<-st_as_sf(read.csv(url("https://www.caerphilly.gov.uk/CaerphillyDocs/FOI/Datasets_polling_stations_csv.aspx")),
coords = c("Easting","Northing"),crs = 27700)%>%
mutate(date = sample(seq(as.Date('2020/01/01'), as.Date('2020/05/31'), by="day"), 147))
test_stack<-polls%>%st_join(polls%>%st_buffer(dist=1000),join=st_within)%>%
filter(Ballot.Box.Polling.Station.x!=Ballot.Box.Polling.Station.y)%>%
add_count(Ballot.Box.Polling.Station.x)%>%
rename(number_of_neighbours = n)%>%
mutate(interval_date = date.x-date.y)%>%
subset(select = -c(6:8,10,11,13:18))## removing this comment will summarise the data so that only number of neighbours is returned %>%
distinct(Ballot.Box.Polling.Station.x,number_of_neighbours,date.x)%>%
filter(number_of_neighbours >=2)
polls%>%mutate(id = as.numeric(row_number()))%>% mutate(thing = case_when(id %% 2 == 0 ~ "stuff",
id %% 2 !=0 ~ "type"))->polls
polls%>%filter(thing=="stuff"& Polling.District.Code =="AC")%>%count()
polls%>%filter(thing == "type" & Polling.District.Code =="IA")%>%count()
如何構建行名稱有意義且列是計算值的數據框?
所以有點像
行名稱值
東西 AC 1
IA 1 型
這聽起來像你想group_by
列thing
和Polling.District.Code
,然后summarize
通過計算其每組length
。 如果希望匯總數據框去掉幾何列,則需要使用st_set_geometry(NULL)
polls %>%
group_by(thing, Polling.District.Code) %>%
summarize(count = length(thing), .groups = "keep") %>%
st_set_geometry(NULL)
#> # A tibble: 147 x 3
#> # Groups: thing, Polling.District.Code [147]
#> thing Polling.District.Code count
#> * <chr> <chr> <int>
#> 1 stuff AC 1
#> 2 stuff AE 1
#> 3 stuff BB1 1
#> 4 stuff CA1 1
#> 5 stuff CB1 1
#> 6 stuff CC 1
#> 7 stuff CE 1
#> 8 stuff DA2 1
#> 9 stuff DB1 1
#> 10 stuff DB3 1
#> # ... with 137 more rows
或者,如果您想保留幾何圖形,請使用:
polls %>%
group_by(thing, Polling.District.Code) %>%
summarize(count = length(thing), .groups = "keep")
#> Simple feature collection with 147 features and 3 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 310399 ymin: 186331 xmax: 325960 ymax: 207788
#> projected CRS: OSGB 1936 / British National Grid
#> # A tibble: 147 x 4
#> # Groups: thing, Polling.District.Code [147]
#> thing Polling.District.Code count geometry
#> <chr> <chr> <int> <POINT [m]>
#> 1 stuff AC 1 (311777 206968)
#> 2 stuff AE 1 (311734 206047)
#> 3 stuff BB1 1 (310577 205577)
#> 4 stuff CA1 1 (314777 202748)
#> 5 stuff CB1 1 (314777 202748)
#> 6 stuff CC 1 (314622 203396)
#> 7 stuff CE 1 (315255 201843)
#> 8 stuff DA2 1 (315780 200318)
#> 9 stuff DB1 1 (314693 199774)
#> 10 stuff DB3 1 (315034 199159)
#> # ... with 137 more rows
我認為答案是 bind_rows
polls%>%filter(thing=="stuff"& Polling.District.Code =="AC")%>%count()->a
polls%>%filter(thing == "type" & Polling.District.Code =="IA")%>%count()->b
bind_rows(a,b)->c
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.