can someone give me an advice? i try to compare two columns. One column is a string with a address and the other one is just a table with country names. But some country names are in english, which i want to replace in the german term. I also have the problem, that im very limited about using additional packages since i want to use the script in a database. My code dont really works. It just replaces one row.
df1
DE
Europa | Deutschland | München
Europa | England | London
Europa | Germany | Berlin
Europa | Italy | Venedig
df2
GE EN
Deutschland Germany
Italien Italy
England UK
Result: df1
DE
Europa | Deutschland | München
Europa | England | London
Europa | Deutschland | Berlin
Europa | Italien | Venedig
I tried following code:
df1 <- data.frame("DE" = c("Europa | Deutschland | München", "Europa | England | London", "Europa | Germany | Berlin ", "Europa | Italy | Venedig"))
df2 <- data.frame("GE" = c("Deutschland", "Italien", "England"), "EN" = c("Germany", "Italy", "UK"))
df1[] <- lapply(df1, as.character)
df2[] <- lapply(df2, as.character)
for(i in seq_along(df1)) df1$DE <- gsub(df2$EN, df2$GE, df1$DE, fixed = FALSE)
You should add [i]
in the for
loop and use fixed = TRUE
as you use fixed pattern and not the regular expressions. Find other modifications in the code:
for(i in seq_along(df2$EN)) {
df1$DE <- gsub(df2$EN[i], df2$GE[i], df1$DE, fixed = TRUE)
}
df1$DE
## [1] "Europa | Deutschland | München"
## [2] "Europa | England | London"
## [3] "Europa | Deutschland | Berlin "
## [4] "Europa | Italien | Venedig"
ps You can use stringsAsFactors = FALSE
in data.frame()
to get strings instead of factors:
df1 <- data.frame("DE" = c("Europa | Deutschland | München",
"Europa | England | London",
"Europa | Germany | Berlin ",
"Europa | Italy | Venedig"),
stringsAsFactors = FALSE)
df2 <- data.frame("GE" = c("Deutschland", "Italien", "England"),
"EN" = c("Germany", "Italy", "UK"),
stringsAsFactors = FALSE)
Here is a solution based on merge
and replace. The reason to split the column is I only want to replace the names in the second column. If we use gsub
with a for-loop, there is a possibility that matching words from other columns may also be replaces. df4
is the final output.
Step 1: Separate the column in df1
by |
.
df1_1 <- as.data.frame(do.call(rbind, lapply(strsplit(df1$DE, split = "\\|"), trimws)),
stringsAsFactors = FALSE)
Step 2: Merge df1_1
and df2
df3 <- merge(df1_1, df2, by.x = "V2", by.y = "EN", all.x = TRUE)
Step 3: Replace the values if the GE
column is not NA
.
df3$V2 <- ifelse(!is.na(df3$GE), df3$GE, df3$V2)
Step 4: Collapse all columns. Prepare the final output.
df3$DE <- apply(df3[, c("V1", "V2", "V3")], 1, paste, collapse = " | ")
df4 <- df3[, "DE", drop = FALSE]
df4
# DE
# 1 Europa | Deutschland | München
# 2 Europa | England | London
# 3 Europa | Deutschland | Berlin
# 4 Europa | Italien | Venedig
DATA
df1 <- data.frame("DE" = c("Europa | Deutschland | München", "Europa | England | London", "Europa | Germany | Berlin ", "Europa | Italy | Venedig"),
stringsAsFactors = FALSE)
df2 <- data.frame("GE" = c("Deutschland", "Italien", "England"),
"EN" = c("Germany", "Italy", "UK"),
stringsAsFactors = FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.