簡體   English   中英

從字符串中提取字符

[英]Extracting characters from string

數據集結構為:

> str(trainData)
'data.frame':   891 obs. of  13 variables:
 $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
 $ Survived   : Factor w/ 2 levels "No","Yes": 1 2 2 2 1 1 1 1 2 2 ...
 $ Pclass     : Factor w/ 3 levels "1st","2nd","3rd": 3 1 3 1 3 3 1 3 3 2 ...
 $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
 $ Sex        : Factor w/ 2 levels "Male","Female": 1 2 2 2 1 1 1 1 2 2 ...
 $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
 $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
 $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
 $ Ticket     : int  NA NA NA 113803 373450 330877 17463 349909 347742 237736 ...
 $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
 $ Cabin      : chr  "" "C85" "" "C123" ...
 $ Embarked   : chr  "S" "C" "S" "S" ...
 $ Area       : Factor w/ 9 levels "","A","B","C",..: 1 4 1 4 1 1 6 1 1 1 ...

我想在數據框中創建一個新列來存儲 Name 變量中包含的地址形式。 為此,我需要提取字符串“Mr”、“Mrs”等,並將它們存儲在一個新向量中。 我想通過以下方式解決這個問題。

vec <- vector()

for (i in 1 : nrow(trainData)) {
  if (grep("Mr\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mr"}
  else if (grep("Miss\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Miss"}
  else if (grep("Mrs\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Mrs"}
  else if (grep("Don\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Don"}
  else if (grep("Master\\.", trainData[i, "Name"]) == 1) {vec[i] <- "Master"}
  else {vec[i] <- "Boh"}
}

.. 然后使用cbind函數將現有數據框與新列FormOfAddress 我沒有測試接下來的兩行代碼,因為我從上一個塊中收到了一條錯誤消息。

trainData <- as.data.frame(cbind(trainData, vec))
names(trainData)[length(trainData)] <- "FormOfAddress"

基本上我在這一點上卡住了..

> vec <- vector()
> for (i in 1 : nrow(trainData)) {
+ if (grep("Mr\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mr"}
+ else if (grep("Miss\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Miss"}
+ else if (grep("Mrs\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Mrs"}
+ else if (grep("Don\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Don"}
+ else if (grep("Master\\.", trainData[i, c("Name")]) == 1) {vec[i] <- "Master"}
+ else {vec[i] <- "Boh"; next}
+ }
Error in if (grep("Mr\\.", trainData[i, c("Name")]) == 1) { : 
  argument is of length zero

if 語句的第一部分在我看來是正確的。 當字符串Mr.包含在 Name 中時,它返回TRUE 此外,第二部分看起來不錯(至少在第一個循環中)並在向量vec上寫入字符串Mr 問題出在我認為的第二個循環上,但我找不到讓它工作的方法。

trainData$Name

## [1] "Braund, Mr. Owen Harris"                            
## [2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
## [3] "Heikkinen, Miss. Laina"                             
## [4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"       
## [5] "tt"                                                 
## [6] "Mr. Jones"                                          

for (x in trainData$Name) {
    print(grep("Mr\\.", x))
    print(grepl("Mr\\.", x));
}

## [1] 1
## [1] TRUE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## integer(0)
## [1] FALSE
## [1] 1
## [1] TRUE

## Doing it without a loop.
## You might have to come up with a different
## regex here depending on the rest of your data
vec <- gsub("^([^,]+, )?([^.]+).*", "\\2", trainData$Name)
## [1] "Mr"   "Mrs"  "Miss" "Mrs"  "tt"   "Mr"  
vec <- ifelse(vec == trainData$Name, "Boh", vec)
## [1] "Mr"   "Mrs"  "Miss" "Mrs"  "Boh"  "Mr"  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM