简体   繁体   中英

Extract substring between two characters - python DataFrame

What is the meaning of string locator ', \\s*([^\\.]*)\\s*\\.' =?

I have a dataframe identical to Extract sub-string between 2 special characters from one column of Pandas DataFrame

and want to extract the substring located between "," and "." . Thanks to the post answer, a way would be as below:

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Although I see the outcome being correct, what is the meaning of ',\\s*([^\\.]*)\\s*\\.' ? In particular, what is the meaning of '*' and '\\'?

It means the following, match:

  • a , (comma)
  • followed by \\s* zero or more whitespaces characters (tab, spaces, etc)
  • followed by ([^\\.])* zero or more characters that are not a . (dot)
  • followed by \\s* zero or more whitespaces characters
  • followed by a \\. (dot)

You can find more about regex in here .

UPDATE

As @UnbearableLightness mentioned the character \\ is redundant inside a character set to escape the . (dot). A character set is anything defined between [] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM