My data frame has a column called "State" and contains the state name, HB/HF number, and the date the law went into effect. I want the state column to only contain the state name and the second column to contain just the year. How would I do this?
Mintz = read.csv('https://github.com/bandcar/mintz/raw/main/State%20Legislation%20on%20Biosimilars2.csv')
mintz = Mintz
# delete rows if col 2 has a blank value.
mintz = mintz[mintz$Substitution.Requirements != "", ]
# removes entire row if column 1 has the word State
mintz=mintz[mintz$State != "State", ]
#reset row numbers
mintz= mintz %>% data.frame(row.names = 1:nrow(.))
# delete PR
mintz = mintz[-34,]
#reset row numbers
mintz= mintz %>% data.frame(row.names = 1:nrow(.))
I'm almost certain I'll need to use strsplit(gsub())
but I'm not sure how to this since there's no specific pattern
EDIT
I still need help keeping only the state name in column 1.
As for moving the year to a new column, I found the below. It works, but I don't know why it works. From my understanding \d means that \d is the actual character it's searching for. the "." means to search for one character, and I have no idea what the \1 means. Another strange thing is that Minnesota (row 20) did not have a year, so it instead used characters. Isn't \d only supposed to be for digits? Someone care to explain?
mintz2 = mintz
mintz2$Year = sub('.*(\\d{4}).*', '\\1', mintz2$State)
One way could be:
For demonstration purposes select
the State
column.
Then we use str_extract
to extract all numbers with 4 digits with that are at the end of the string \\d{4}
-> this gives us the Year
column.
Finally we make use of the inbuilt state.name
function make a pattern of it an use it again with str_extract
and remove NA rows.
library(dplyr)
library(stringr)
mintz %>%
select(State) %>%
mutate(Year = str_extract(State, '\\d{4}$'), .after=State,
State = str_extract(State, paste(state.name, collapse='|'))
) %>%
na.omit()
State Year
2 Arizona 2016
3 California 2016
7 Connecticut 2018
12 Florida 2013
13 Georgia 2015
16 Hawaii 2016
21 Illinois 2016
24 Indiana 2014
28 Iowa 2017
32 Kansas 2017
33 Kentucky 2016
34 Louisiana 2015
39 Maryland 2017
42 Michigan 2018
46 Missouri 2016
47 Montana 2017
50 Nebraska 2018
51 Nevada 2018
54 New Hampshire 2018
55 New Jersey 2016
59 New York 2017
62 North Carolina 2015
63 North Dakota 2013
66 Ohio 2017
67 Oregon 2016
70 Pennsylvania 2016
74 Rhode Island 2016
75 South Carolina 2017
78 South Dakota 2019
79 Tennessee 2015
82 Texas 2015
85 Utah 2015
88 Vermont 2018
89 Virginia 2013
92 Washington 2015
93 West Virginia 2018
96 Wisconsin 2019
97 Wyoming 2018
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.