extract number in string using regex

Question

I have a data.frame like this : SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )

  coiffure_IDF.SIREN    coiffure_IDF.L6_NORMALISEE

1 54805015            75008 PARIS

2 300086907           94210 ST MAUR DES FOSSES

3 300090453           94220 CHARENTON LE PONT

4 300209608           75007 PARIS

5 300570553           95880 ENGHIEN LES BAINS

6 301123626           75019 PARIS

7 301362349           92300 LEVALLOIS PERRET

I want to have this :

  coiffure_IDF.SIREN    codpos_norm     ville

1 54805015            75008             PARIS

2 300086907           94210           ST MAUR DES FOSSES

3 300090453           94220           CHARENTON LE PONT

4 300209608           75007            PARIS

5 300570553           95880            ENGHIEN LES BAINS

6 301123626           75019             PARIS

7 301362349           92300             LEVALLOIS PERRET

so I used regex : SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\\\d+)\\\\s+(\\\\S+)")

so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc so I tried to add some \\\\s+ and \\\\S+ in the regex but R told me that they are to many groups and that it has to have only 2 groups.

What could I do ?

Answer 1

You need to match the rest of the string in Group 2, the \\S construct only matches non-whitespace chars. Use .+ to match any 1+ chars up to the string end:

extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(.+)")

You may use .* to match empty strings (if there is no text after 1+ whitespaces).

extract number in string using regex

Question

1 answers

solution1
2 ACCPTED 2018-08-03 10:24:50

extract number in string using regex

Question

1 answers

solution1 2 ACCPTED 2018-08-03 10:24:50

solution1
2 ACCPTED 2018-08-03 10:24:50