简体   繁体   中英

Extract specific value from the string column of a Pandas Data frame

I am new to Python. I have the data out from a plugin which is coming through an excel sheet and I need to extract the values from that column.

  Plugin Output

 Country:USA   State: Virginia Address: 23 xys lane  SSN:2345550404  Zip : 22102 City: Fairfax

 Country:India State:Virginia  SSN:2345550401  ZIP:452002  City: Indore

I need to search the SSN in each row and create a new column in the new pandas data frame to create a separate column.

Desired Output:

  SSN

 2345550404

 2345550401

Answer for Serial Number:

def find_serialnumber(x):
  num = re.findall(r'Serial Number:\s*([^\n]+)', x)
  return " ".join(num)
import re

    def find_number(x):
        num = re.findall(r'(?:SSN_)(\d+)', x)
        return " ".join(num)

    df['SSN'] =df['Output'].apply(lambda x: find_number(x))

Also extract function from pandas:

So \d+ means match 1 or more digits.

df['SSN'] = df['Output'].apply(lambda x: re.findall(r'(?:SSN_)(\d+)', x)[0] if re.findall(r'(?:SSN_)(\d+)', x) else x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM