简体   繁体   中英

How to change some character values to numeric ones in a dataframe + selecting values within a column using values form another column in R

First of all I'm sorry that what I'm asking is trivial to you, I am new to coding and bad at it too.

I'm working with a dataframe obtained from an excel file, looks pretty basic. I uploaded the actual excel file here so maybe you can use it if you need to: https://easyupload.io/s6hs29

All I want to do is to calculate the mean of all the values in column "Distance.moved1" that are within the time values 0:30:00-0:30:30 and 0:37:00-0:37:30 in column "X.1".

The first problem I need to fix is the class of the columns of the dataframe.

First, I need to change the class from character to numeric if I want to calculate a mean. Problem is, column "Distance.moved.1" contains some characters in the first elements.

If I run:

as.numeric(DF$Distance.moved.1)   

it prints me all the numbers and puts NA in place of all the characters. I would be fine with this, but if I then check the column class, it still is a character.

So I thought, maybe I can skip the first character values and only convert the actual numbers to numeric, so the elements from the 5th to the last one in the dataframe. Is this even possible? I tried this:

as.numeric(DF$Distance.moved.1[5:1350])

it seemed to work, no errors and correct print. but once I run this:

class(DF$Distance.moved.1[5])

I still get "character". What am I doing wrong? I guess I could delete the first 5 characters from the column and retry but there must be a better way.

For the second problem (which I can't test until I figure out how to get numeric values): I want to calculate the mean of all the values in column "Distance.moved.1" that go from the time 0:30:00-0:30:30 to 0:37:00-0:37:30, found in column "X.1".

A way could be using the element numbers for the 2 rows, something like this:

Mean1 <-  mean(c(DF$Distance.moved.1[65:79]))

But what if I want to use the time frames I have? Can I keep column "X.1" as character and just run this successfully?

Mean1 <-  mean(c(DF$Distance.moved.1["0:30:00-0:30:30":"0:37:00-0:37:30"]))

Please help and thank you!

Your data requires quite a comprehensive treatment: reading in the data and adjusting column names, creating proper dates, finally filtering and computing the mean. Below is a solution using the packages tidyverse (for general data wrangling) and lubridate (for dealing with dates).

I suggest you see the documentation to see what each function does. To do so, place your cursor within the function name and hit F1 in R Studio. I also suggest you read up on date-time objects here .

Probably you will have to tweak the code below because my column names differ from what you describe in your question and it is unclear what "start" refers to in the time field. But hopefully this can get you started:

library(tidyverse)
library(lubridate)

# load data, rename columns
df <- read_csv("~/Downloads/NN trial 6-7-9-10.csv",
               skip = 5L,
               col_names = c("id", "time", "n", "average", "se"))

# create complete date-times, convert into interval objects
df <- df %>% 
  separate(time, into = c("start", "end"), sep = "-") %>% 
  mutate(across(c(start, end), ~str_c("2022-01-01T0", .x))) %>% 
  mutate(time = interval(start, end))

# filter and summarize (mean)
period1 <- interval("2022-01-01T00:30:00", "2022-01-01T00:30:30")
period2 <- interval("2022-01-01T00:37:00", "2022-01-01T00:37:30")

df %>% 
  filter(time %within% period1 | time %within% period2) %>% 
  summarize(across(c(n, average, se), mean))
#> # A tibble: 1 × 3
#>       n average    se
#>   <dbl>   <dbl> <dbl>
#> 1  7.23    62.6  12.8

Created on 2022-11-17 with reprex v2.0.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM