How cut latex acronym chain in R dataframe

Question

I have a latex file with my acronym definitions like :

\newacronym{AEP}{AEP}{Alimentation en Eau Potable}
\newacronym{AERMC}{AERMC}{Agence de l'Eau Rhône Méditerranée et Corse}
\newacronym[longplural=Cotes d'Abondance Numériques]{CAN}{CAN}{Cote d'Abondance Numérique}

My aim is to have a data frame with two columns like :

AEP     Alimentation en Eau Potable
AERMC   Agence de l'Eau Rhône Méditerranée et Corse
CAN     Cote d'Abondance Numérique

I think it's possible with regex or strsplit formula, but I can't establish it, with lot of problems with {

acronymes <- read_lines("acronymes.tex")
acronymes <- as.tbl(as.data.frame(acronymes))
acronymes %>% 
    rename(Complet = acronymes) %>% 
    filter(!grepl("^%.*", Complet)) # Because I have non used lines
acronymes$ABR <- sub("}.*","", acronymes$Complet)

Have you ideas or explicite manual for regex formulas ? Thank you

Answer 1

Maybe not the most elegant solution, but this works. You need to escape the braces with a double backslash:

a <- readLines("acronymes.tex")
acronyms <- gsub(".*\\}\\{(.*)\\}\\{.*", "\\1", a)
descriptions <- gsub(".*\\}\\{(.*)\\}$", "\\1", a)
data.frame(acronyms, descriptions)

How cut latex acronym chain in R dataframe

Question

1 answers

solution1
0 ACCPTED 2016-10-18 16:09:03

How cut latex acronym chain in R dataframe

Question

1 answers

solution1 0 ACCPTED 2016-10-18 16:09:03

solution1
0 ACCPTED 2016-10-18 16:09:03