简体   繁体   中英

Split column into multiple columns based on content of column in Pandas

I have a column with data like this

Ticket NO: 123456789; Location ID:ABC123; Type:Network;

Ticket No. 132123456, Location ID:ABC444; Type:App

Tickt#222256789; Location ID:AMC121; Type:Network;

I am trying like this

new = data["Description"].str.split(";", n = 1, expand = True)
data["Ticket"]= new[0]
data["Location"]= new[1]  
data["Type"]= new[2]

# Dropping old  columns
data.drop(columns =["Description"], inplace = True)

I can separate based on ";"but how to do for both ";" and ","?

A more general solution, that allows you to perform as much processing as you like comfortably. Let's start by defining an example dataframe for easy debugging:

df = pd.DataFrame({'Description': [
    'Ticket NO: 123456789 , Location ID:ABC123; Type:Network;',
    'Ticket NO: 123456789 ; Location ID:ABC123; Type:Network;']})

Then, let's define our processing function, where you can do anything you like:

def process(row):
    parts = re.split(r'[,;]', row)
    return pd.Series({'Ticket': parts[0], 'Location': parts[1], 'Type': parts[2]})

In addition to splitting by ,; and then separating into the 3 sections, you can add code that will strip whitespace characters, remove whatever is on the left of the colons etc. For example, try:

def process(row):
    parts = re.split(r'[,;]', row)
    data = {}
    for part in parts:
        for field in ['Ticket', 'Location', 'Type']:
            if field.lower() in part.lower():
                data[field] = part.split(':')[1].strip()
    return pd.Series(data)

Finally, apply to get the result:

df['Description'].apply(process)

This is much more readable and easily maintainable than doing everything in a single regex, especially as you might end up needing additional processing.

The output of this application will look like this:

部分输出

To add this output to the original dataframe, simply run:

df[['Ticket', 'Location', 'Type']] = df['Description'].apply(process)

全输出

You can use

new = data["Description"].str.split("[;,]", n = 2, expand = True)
new.columns = ['Ticket', 'Location', 'Type']

Output:

>>> new
                  Ticket             Location            Type
0  Ticket NO: 123456789    Location ID:ABC123   Type:Network;
1   Ticket No. 132123456   Location ID:ABC444        Type:App
2       Tickt#222256789    Location ID:AMC121   Type:Network;

The [;,] regex matches either a ; or a , char, and n=2 sets max split to two times.

Another regex Series.str.extract solution:

new[['Ticket', 'Location', 'Type']] = data['Description'].str.extract(r"(?i)Ticke?t\D*(\d+)\W*Location ID\W*(\w+)\W*Type:(\w+)")
>>> new
      Ticket Location     Type
0  123456789   ABC123  Network
1  132123456   ABC444      App
2  222256789   AMC121  Network
>>> 

See the regex demo . Details :

  • (?i) - case insensitive flag
  • Ticke?t - Ticket with an optional e
  • \D* - zero or more non-digit chars
  • (\d+) - Group 1: one or more digits
  • \W* - zero or more non-word chars
  • Location ID - a string
  • \W* - zero or more non-word chars
  • (\w+) - Group 2: one or more word chars
  • \W* - zero or more non-word chars
  • Type: - a string
  • (\w+) - Group 3: one or more word chars

One approach using str.extract

Ex:

df[['Ticket', 'Location', 'Type']] = df['Description'].str.extract(r"[Ticket\sNO:.#](\d+).*ID:([A-Z0-9]+).*Type:([A-Za-z]+)", flags=re.I)
print(df[['Ticket', 'Location', 'Type']])

Output:

      Ticket Location     Type
0  123456789   ABC123  Network
1  132123456   ABC444      App
2  222256789   AMC121  Network

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM