简体   繁体   中英

Converting data.frame from character to numeric in R to use in Time Series function

I am currently using R(3.2.1), and having some problem with converting my dataset to numeric so to plot my time series graph.

I read my data table extracted from a html page source, and have stored it in my global environment. I can't convert my data.frame from character to numeric, and this is the example of heading of my data.

> head(World)
    World  
V3 "5,689"
V4 "4,672"
V5 "4,344"
V6 "3,745"
V7 "4,246"
V8 "4,823"

This is my structure of data

> str(World)
 'data.frame':  108 obs. of  1 variable:
 $ World: chr  "1,234" "1,234" "1,234" "4,321" ...

I would like to convert this data to time series, however,

ts(as.data.frame(sapply(World, function(x) gsub("\"", "", x))))

give me the integer values of the character type, such as

Time Series:
Start = 1 
End = 6 
Frequency = 1 
     World
[1,]    49
[2,]    41
[3,]    37
[4,]    32
[5,]    36
[6,]    43

I have tried

 as.numeric(as.character(World[,1]))

but it gave me NA values with Warning message: NAs introduced by coercion.

I can see the value of World without quote, etc, however, when I use it as Time Series, the values change.

I would like to my end product to be

Time Series:
Start = 1 
End = 6 
Frequency = 1 
     World
[1,]    5,689
[2,]    4,672
[3,]    4,333
[4,]    3,745
[5,]    4,246
[6,]    4,823

I would appreciate any help given.

Thanks

The warning message is because your "numbers" have commas in them. Remove the commas (or convert them to periods, if they're supposed to be decimal separators) and the conversion to numeric will work.

Also, your World object doesn't appear to be a data.frame, because data.frames don't print character vectors with the quotes. More likely, it's a matrix.

R> # if the comma is a thousands separator
R> ts(as.matrix(as.numeric(gsub(",", "", World[,1]))))
Time Series:
Start = 1 
End = 6 
Frequency = 1 
     Series 1
[1,]     5689
[2,]     4672
[3,]     4344
[4,]     3745
[5,]     4246
[6,]     4823
R> # if the comma is a decimal separator
R> ts(as.matrix(as.numeric(gsub(",", ".", World[,1]))))
Time Series:
Start = 1 
End = 6 
Frequency = 1 
     Series 1
[1,]    5.689
[2,]    4.672
[3,]    4.344
[4,]    3.745
[5,]    4.246
[6,]    4.823

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM