简体   繁体   中英

extract number in string using regex

I have a data.frame like this : SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )

  coiffure_IDF.SIREN    coiffure_IDF.L6_NORMALISEE

1 54805015            75008 PARIS

2 300086907           94210 ST MAUR DES FOSSES

3 300090453           94220 CHARENTON LE PONT

4 300209608           75007 PARIS

5 300570553           95880 ENGHIEN LES BAINS

6 301123626           75019 PARIS

7 301362349           92300 LEVALLOIS PERRET

I want to have this :

  coiffure_IDF.SIREN    codpos_norm     ville

1 54805015            75008             PARIS

2 300086907           94210           ST MAUR DES FOSSES

3 300090453           94220           CHARENTON LE PONT

4 300209608           75007            PARIS

5 300570553           95880            ENGHIEN LES BAINS

6 301123626           75019             PARIS

7 301362349           92300             LEVALLOIS PERRET

so I used regex : SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\\\d+)\\\\s+(\\\\S+)")

so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc so I tried to add some \\\\s+ and \\\\S+ in the regex but R told me that they are to many groups and that it has to have only 2 groups.

What could I do ?

You need to match the rest of the string in Group 2, the \\S construct only matches non-whitespace chars. Use .+ to match any 1+ chars up to the string end:

extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(.+)")

You may use .* to match empty strings (if there is no text after 1+ whitespaces).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM