简体   繁体   中英

Make more than one column based on existing

I currently have a column which has data I want to parse, and then put this data on other columns. Currently the best I can get is from using the apply method:

def parse_parent_names(row):
    split = row.person_with_parent_names.split('|')[2:-1]
    return split

df['parsed'] = train_data.apply(parse_parent_names, axis=1).head()

The data is a panda df with a column that has names separated by a pipe (|):

'person_with_parent_names'
|John|Doe|Bobba|
|Fett|Bobba|
|Abe|Bea|Cosby|

Being the rightmost one the person and the leftmost the "grandest parent". I'd like to transform this to three columns, like:

'grandfather'    'father'    'person'
John             Doe         Bobba
                 Fett        Bobba
Abe              Bea         Cosby

But with apply, the best I can get is

'parsed'
[John, Doe,Bobba]
[Fett, Bobba]
[Abe, Bea, Cosby]

I could use apply three times, but it would not be efficient to read the entire dataset three times.

Your function should be changed by compare number of | and split by ternary operator, last pass to DataFrame constructor:

def parse_parent_names(row):
    m = row.count('|') == 4
    split = row.split('|')[1:-1] if m else row.split('|')[:-1]
    return split

cols = ['grandfather','father','person']
df1 = pd.DataFrame([parse_parent_names(x) for x in df.person_with_parent_names],
                    columns=cols)
print (df1)
  grandfather father person
0        John    Doe  Bobba
1               Fett  Bobba
2         Abe    Bea  Cosby

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM