简体   繁体   English

如何使用 num_range 选择在一个特定列中都包含相同前 4 位数字的行? (希望使用 dplyr/tidyverse)

[英]How do I use num_range to select rows which all contain the same first 4 digits in one specific column? (hoping to use dplyr/tidyverse)

my question is best asked in 2 parts:我的问题最好分为两部分:

I am dealing with a dataset that looks at forest product usage across many countries.我正在处理一个数据集,该数据集查看了许多国家/地区的林产品使用情况。 Each row represents a household from any one of these countries (about 30 total).每行代表来自这些国家中任何一个国家的一个家庭(总共约 30 个)。 Each country has a code (4 digits), but in the dataset there is no column for country code.每个国家都有一个代码(4 位),但数据集中没有国家代码列。 The way you can deduce which households came from which country is by using the household ID ("ghousehold").您可以通过使用家庭 ID(“ghousehold”)来推断哪些家庭来自哪个国家/地区。 Ghousecode is a 7-digit code, the first 4 digits being the country code. Ghousecode 是一个 7 位代码,前 4 位是国家代码。 For example, if Bolivia were country code: 3024, then a household in Bolivia could be 3024105 or 3024999...例如,如果玻利维亚是国家代码:3024,那么玻利维亚的一个家庭可能是 3024105 或 3024999...

I want to have a code that selects all the entries for a specific country.我想要一个代码来选择特定国家/地区的所有条目。 I am using the tidyverse, so I thought of using select() and num_range() but it hasn't worked.我正在使用 tidyverse,所以我想使用 select() 和 num_range() 但它没有用。 I don't get an error message, but when I look at my output I can tell it hasn't worked.我没有收到错误消息,但是当我查看我的输出时,我可以看出它没有工作。 Here is my current code:这是我当前的代码:

    #forest_use_tibble is a tibble with observations on forest usage from many countries
    #I selected a subset of the original file's variables. 

    forest_use_simpler <- select(forest_use_tibble, ghousecode, year, product, income, amount, unit)

    #take Bolivia, whose country ID is 3024. This means that each ghousecode that begins with 
     3024 is from Bolivia. 
    #but each ghousecode is 3024xxx with three other numbers after it.

    x = 3024
    Bolivia <- select(forest_use_simpler, num_range("x", 001:999), everything())

    #my goal: a new tibble/dataframe that has only the entries from Bolivia
    #there is no separate column for country ID, unfortunately.

Any ideas?有任何想法吗?

Second part of the question: Is there a way to query just one of the columns (ie variables, in this case ghousecode) for the num_range?问题的第二部分:有没有办法只查询 num_range 的一列(即变量,在本例中为 ghousecode)? The way I have it above strikes me like it would search all variables in forest_use_simpler, so there is a chance that it may include another country's household if the digits 3024 appeared somewhere other than ghousecode.我上面的方式让我印象深刻,就像它会搜索forest_use_simple中的所有变量一样,所以如果数字3024出现在ghousecode以外的其他地方,它就有可能包括另一个国家的家庭。

Thank you!谢谢!

(note: i have also tried putting in 3024 directly where x is to no avail. Thanks again for all help.) (注意:我也试过直接在 x 无效的地方输入 3024。再次感谢所有帮助。)

If the ghousecode is consistently formatted with 7 digits, how about something like this?如果ghousecode的格式始终为 7 位数字,那么这样的事情怎么样?

library(tidyverse)

df <-
  tibble(
    ghousecode = c(2039434, 3024105),
    year = c(2019, 2019)
  )

df %>% 
  mutate(country_code = floor(ghousecode / 1000)) %>% 
  filter(country_code == 3024)

select chooses columns, while filter chooses rows. select选择列,而filter选择行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 dplyr :: select(num_range())当数字在列名的中间时 - `dplyr::select(num_range())` when the number is in middle of column name 当列作为字符对象输入时,如何使用tidyverse选择? - How do I use tidyverse select when the column is input as a character object? 如何使用 dplyr 过滤特定列中值为 1 且所有 rest 为 0 的行? - How to use dplyr to filter rows where value in a specific column is 1 and all the rest are 0? 如何使用一列来控制 r 中另一列的 select 行? - How do I use one column to gate select rows of another column in r? 如何在同一调用中使用在dplyr :: do()中创建的列? - How do I use a column created in dplyr::do() within same call? 如何使用dplyr :: arrange对NA进行排序? - How do I use dplyr::arrange to sort NA's first? 如何通过R中的dplyr / tidyverse将分组的行复制到列中? - How to copy grouped rows into column by dplyr/tidyverse in R? 如何使用 dplyr 关联 for 循环中的每一列? - How do I use dplyr to correlate each column in a for loop? 基于 R tidyverse 中另一列的重叠范围内的 Select 行 - Select rows within an overlapping range based on another column in R tidyverse 如何使用str_which从Vector中选择包含字符串的行 - How to use str_which to select rows which contain a string from a Vector
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM