[英]Extracting characters from string
數據集結構為:
> str(trainData)
'data.frame': 891 obs. of 13 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 2 2 ...
$ Pclass : Factor w/ 3 levels "1st","2nd","3rd": 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : Factor w/ 2 levels "Male","Female": 1 2 2 2 1 1 1 1 2 2 ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : int NA NA NA 113803 373450 330877 17463 349909 347742 237736 ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : chr "" "C85" "" "C123" ...
$ Embarked : chr "S" "C" "S" "S" ...
$ Area : Factor w/ 9 levels "","A","B","C",..: 1 4 1 4 1 1 6 1 1 1 ...
我想在數據框中創建一個新列來存儲 Name 變量中包含的地址形式。 為此,我需要提取字符串“Mr”、“Mrs”等,並將它們存儲在一個新向量中。 我想通過以下方式解決這個問題。
vec <- vector()
for (i in 1 : nrow(trainData)) {
if (grep("Mr\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mr"}
else if (grep("Miss\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Miss"}
else if (grep("Mrs\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mrs"}
else if (grep("Don\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Don"}
else if (grep("Master\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Master"}
else {vec[i] <- "Boh"}
}
.. 然后使用cbind
函數將現有數據框與新列FormOfAddress
。 我沒有測試接下來的兩行代碼,因為我從上一個塊中收到了一條錯誤消息。
trainData <- as.data.frame(cbind(trainData, vec))
names(trainData)[length(trainData)] <- "FormOfAddress"
基本上我在這一點上卡住了..
> vec <- vector()
> for (i in 1 : nrow(trainData)) {
+ if (grep("Mr\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mr"}
+ else if (grep("Miss\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Miss"}
+ else if (grep("Mrs\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mrs"}
+ else if (grep("Don\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Don"}
+ else if (grep("Master\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Master"}
+ else {vec[i] <- "Boh"; next}
+ }
Error in if (grep("Mr\\.", trainData[i, c("Name")]) == 1) { :
argument is of length zero
if 語句的第一部分在我看來是正確的。 當字符串Mr.
包含在 Name 中時,它返回TRUE
。 此外,第二部分看起來不錯(至少在第一個循環中)並在向量vec
上寫入字符串Mr
。 問題出在我認為的第二個循環上,但我找不到讓它工作的方法。
trainData$Name
## [1] "Braund, Mr. Owen Harris"
## [2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
## [3] "Heikkinen, Miss. Laina"
## [4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"
## [5] "tt"
## [6] "Mr. Jones"
for (x in trainData$Name) {
print(grep("Mr\\.", x))
print(grepl("Mr\\.", x));
}
## [1] 1
## [1] TRUE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## [1] 1
## [1] TRUE
## Doing it without a loop.
## You might have to come up with a different
## regex here depending on the rest of your data
vec <- gsub("^([^,]+, )?([^.]+).*", "\\2", trainData$Name)
## [1] "Mr" "Mrs" "Miss" "Mrs" "tt" "Mr"
vec <- ifelse(vec == trainData$Name, "Boh", vec)
## [1] "Mr" "Mrs" "Miss" "Mrs" "Boh" "Mr"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.