I have scraped the following tables from wikipedia using the XML package:
http://en.wikipedia.org/wiki/2014_FIFA_World_Cup_squads
You'll notice that the dob variable is as follows on the webpage: 4 January 1985 (aged 29)
This reads in my R dataframe as follows: (1985-01-04)4 January 1985 (aged 29)
It is treated in R in the scraped data as a factor, not a date.
I am trying to create a variable that simply has the dob in the YYYY-MM-DD format, but am having trouble reformatting the 'dob' variable as such.
I've tried the following without success (my dataframe is called alpha):
alpha$newvar <- as.Date(alpha$dob, "%Y%m%d")
alpha$newvar <- strptime(alpha$dob,format="%Y%m%d")
Here are sample data for the South Korean squad:
structure(list(no = structure(c(1L, 12L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 13L,
14L, 15L, 16L), .Label = c("1", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "2", "20", "21", "22", "23", "3",
"4", "5", "6", "7", "8", "9"), class = "factor"), pos = structure(c(1L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 2L, 1L, 2L, 1L), .Label = c("1GK", "2DF", "3MF", "4FW"
), class = "factor"), player = structure(c(6L, 9L, 23L, 14L,
12L, 4L, 8L, 1L, 22L, 19L, 17L, 18L, 13L, 2L, 20L, 7L, 16L, 11L,
5L, 3L, 10L, 21L, 15L), .Label = c("Ha Dae-sung", "Han Kook-young",
"Hong Jeong-ho", "Hwang Seok-ho", "Ji Dong-won", "Jung Sung-ryong",
"Ki Sung-yueng", "Kim Bo-kyung", "Kim Chang-soo", "Kim Seung-gyu",
"Kim Shin-wook", "Kim Young-gwon", "Koo Ja-cheol (c)", "Kwak Tae-hwi",
"Lee Bum-young", "Lee Chung-yong", "Lee Keun-ho", "Lee Yong",
"Park Chu-young", "Park Jong-woo", "Park Joo-ho[67]", "Son Heung-min",
"Yun Suk-young"), class = "factor"), dob = structure(c(2L, 6L,
18L, 1L, 19L, 15L, 17L, 3L, 23L, 5L, 4L, 7L, 12L, 20L, 13L, 11L,
10L, 9L, 22L, 16L, 21L, 8L, 14L), .Label = c("(1981-07-08)8 July 1981 (aged 32)",
"(1985-01-04)4 January 1985 (aged 29)", "(1985-03-02)2 March 1985 (aged 29)",
"(1985-04-11)11 April 1985 (aged 29)", "(1985-07-10)10 July 1985 (aged 28)",
"(1985-09-12)12 September 1985 (aged 28)", "(1986-12-24)24 December 1986 (aged 27)",
"(1987-01-16)16 January 1987 (aged 27)", "(1988-04-14)14 April 1988 (aged 26)",
"(1988-07-02)2 July 1988 (aged 25)", "(1989-01-24)24 January 1989 (aged 25)",
"(1989-02-27)27 February 1989 (aged 25)", "(1989-03-10)10 March 1989 (aged 25)",
"(1989-04-02)2 April 1989 (aged 25)", "(1989-06-27)27 June 1989 (aged 24)",
"(1989-08-12)12 August 1989 (aged 24)", "(1989-10-06)6 October 1989 (aged 24)",
"(1990-02-13)13 February 1990 (aged 24)", "(1990-02-27)27 February 1990 (aged 24)",
"(1990-04-19)19 April 1990 (aged 24)", "(1990-09-30)30 September 1990 (aged 23)",
"(1991-05-28)28 May 1991 (aged 23)", "(1992-07-08)8 July 1992 (aged 21)"
), class = "factor"), caps = structure(c(17L, 20L, 13L, 11L,
6L, 10L, 9L, 4L, 7L, 19L, 18L, 3L, 12L, 2L, 2L, 16L, 15L, 8L,
9L, 7L, 14L, 5L, 1L), .Label = c("0", "10", "12", "13", "14",
"21", "25", "27", "28", "3", "35", "37", "4", "5", "55", "58",
"61", "63", "64", "9"), class = "factor"), club = structure(c(16L,
10L, 12L, 1L, 8L, 13L, 6L, 3L, 2L, 18L, 14L, 17L, 11L, 10L, 9L,
15L, 4L, 17L, 7L, 7L, 17L, 11L, 5L), .Label = c("Al-Hilal", "Bayer Leverkusen",
"Beijing Guoan", "Bolton Wanderers", "Busan IPark", "Cardiff City",
"FC Augsburg", "Guangzhou Evergrande", "Guangzhou R&F", "Kashiwa Reysol",
"Mainz 05", "Queens Park Rangers", "Sanfrecce Hiroshima", "Sangju Sangmu",
"Sunderland", "Suwon Bluewings", "Ulsan Hyundai", "Watford"), class = "factor")), .Names = c("no",
"pos", "player", "dob", "caps", "club"), row.names = c(NA, -23L
), class = "data.frame")
I can answer my own question. The issue was that to correctly tell R the date format, it had to know that the date was contained within brackets.
so,
as.character(strptime(alpha$dob, format = "(%Y-%m-%d)"))
putting "(%Y-%m-%d)" as the format gets R to search the character string for the date format inside brackets.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.