I have a data.frame like this : SO <- data.frame(coiffure_IDF$SIREN, coiffure_IDF$L6_NORMALISEE )
coiffure_IDF.SIREN coiffure_IDF.L6_NORMALISEE
1 54805015 75008 PARIS
2 300086907 94210 ST MAUR DES FOSSES
3 300090453 94220 CHARENTON LE PONT
4 300209608 75007 PARIS
5 300570553 95880 ENGHIEN LES BAINS
6 301123626 75019 PARIS
7 301362349 92300 LEVALLOIS PERRET
I want to have this :
coiffure_IDF.SIREN codpos_norm ville
1 54805015 75008 PARIS
2 300086907 94210 ST MAUR DES FOSSES
3 300090453 94220 CHARENTON LE PONT
4 300209608 75007 PARIS
5 300570553 95880 ENGHIEN LES BAINS
6 301123626 75019 PARIS
7 301362349 92300 LEVALLOIS PERRET
so I used regex : SO2<- SO %>% extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\\\d+)\\\\s+(\\\\S+)")
so I have the right column is "codpos_norm" but in "ville" in line 2 I just have "ST" in stead of "ST MAUR DES FOSSES". In line 3 just "CHARENTON", etc so I tried to add some \\\\s+
and \\\\S+
in the regex but R told me that they are to many groups and that it has to have only 2 groups.
What could I do ?
You need to match the rest of the string in Group 2, the \\S
construct only matches non-whitespace chars. Use .+
to match any 1+ chars up to the string end:
extract(col="coiffure_IDF.L6_NORMALISEE", into=c("codpos_norm", "ville"), regex="(\\d+)\\s+(.+)")
You may use .*
to match empty strings (if there is no text after 1+ whitespaces).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.