簡體   English   中英

R:提取列中的部分字符串

[英]R: Extracting part of string in a column

你好 stackoverflow 社區,

我在數據集中有以下問題。 我必須提取“BMI”列中字符串的前四個符號,然后將它們轉換為數字。

例如:而不是“23.3 [21.9-24.6]”-> 23.3

你可以在這里找到數據: https://github.com/tanaytuncer/LifeExpectancy_BMI

library(mosaic)
library(tidyverse)

path <- "/Users/tanaytuncer/Desktop/Quantitative Datenanalyse/BMI.csv"
df_BMI <- read.csv(path)
df_BMI <- df_BMI[-1:-3, ]
df_BMI <- df_BMI %>%
  rename(country = "X",
         "2000" = "X2000",
         "2001" = "X2001",
         "2002" = "X2002",
         "2003" = "X2003",
         "2004" = "X2004",
         "2005" = "X2005",
         "2006" = "X2006",
         "2007" = "X2007",
         "2008" = "X2008",
         "2009" = "X2009",
         "2010" = "X2010",
         "2011" = "X2011",
         "2012" = "X2012",
         "2013" = "X2013",
         "2014" = "X2014",
         "2015" = "X2015"
         )
df_BMI <- df_BMI %>%
  gather("year", "BMI", 2:17)

我們可以使用substr

df_BMI$BMI <- as.numeric(substr(df_BMI$BMI, 1, 4))

或者使用來自parse_numberreadr

library(readr)
df_BMI$BMI <- parse_number(df_BMI$BMI)

在不gather成“長”格式的情況下,我們也可以使用across

library(dplyr)
df_BMI1 <-  df_BMI %>%
       mutate(across(-country, parse_number))

如果我們使用check.names = FALSE可以避免rename

df_BMI <- read.csv("https://raw.githubusercontent.com/tanaytuncer/LifeExpectancy_BMI/main/BMI.csv", check.names = FALSE)
df_BMI <- df_BMI[-1:-3, ]
names(df_BMI)[1] <- "country"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM