简体   繁体   中英

Extract data from dataset

I need to extract title from name but cannot understand how it is working . I have provided the code below :

combine = [traindata , testdata] 

for dataset in combine:
    dataset["title"] = dataset["Name"].str.extract(' ([A-Za-z]+)\.' , expand = False )

There is no error but i need to understand the working of above code

Name

Braund, Mr. Owen Harris

Cumings, Mrs. John Bradley (Florence Briggs Thayer)

Heikkinen, Miss. Laina

Futrelle, Mrs. Jacques Heath (Lily May Peel)

Allen, Mr. William Henry

Moran, Mr. James

above is the name feature from csv file and in dataset["title"] it stores the title of each name that is mr , miss , master , etc

Your code extracts the title from name using pandas.Series.str.extract function which uses regex

pandas.series.str.extract - Extract capture groups in the regex pat as columns in a DataFrame.

' ([A-Za-z]+)\\.' this is a regex pattern in your code which finds the part of string that is here Name wherever a . is present.

[A-Za-z] - this part of pattern looks for charaters between alphabetic range of az and AZ

+ it states that there can be more than one character

\\. looks for following . after a part of string

An example is provided on the link above where it extracts a part from string and puts the parts in seprate columns

我发现这个带有链接的特定响应对于如何使用“str”的提取方法以及将字符串放在列和系列中并将扩展值从 True 更改为 False 非常有帮助。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM