简体   繁体   中英

How to assign specific substring to column in Python or Pandas

I am relatively new to Python. I am trying to understand how I can breakdown a column, by extracting substrings, and the assigning the substrings to a specific column. Please see below on what I want to do:

Output that I want:

在此处输入图片说明

So far I have used the following code to break down one column into multiple columns, but this simply breaks down the string into multiple columns but not in the specific order I want. Is there a way to do this with Python?

my_ingredients = my_ingredients.str.split(',',expand = True)
my_ingredients.head()

The output I am getting now is as shown below:

Wrong output which I don't want:

在此处输入图片说明

Any suggestions on how I can do this with Python?

Thank you!

If the dataframe looks like

import pandas as pd

df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple","Apple,Banana,Cat","Cat,Apple,Banana"]})

#              My_Date
#  0  Apple,Cat,Banana
#  1  Banana,Cat,Apple
#  2  Apple,Banana,Cat
#  3  Cat,Apple,Banana

Then maybe

df = df['My_Date'].apply(lambda x: pd.Series(sorted(x.split(','))))

#         0       1    2
#  0  Apple  Banana  Cat
#  1  Apple  Banana  Cat
#  2  Apple  Banana  Cat
#  3  Apple  Banana  Cat

is what you're looking for. It simply sorts the elements alphabetically.


Note, though, that this solution will not place the columns in the right place if there are rows in the original dataframe that aren't a permutation of those three elements. I would imagine that for practical purposes, you might want comma separated values that have varying elements and size. In this case, there may be columns which have elements that are empty. If instead you have a dataframe that looks more like

df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple,","Banana,Cat","Cat,Dog,Apple,Banana"]})

#                       My_Date
#  0           Apple,Cat,Banana
#  1  Banana,Cat,Apple,Elephant
#  2                 Banana,Cat
#  3       Cat,Dog,Apple,Banana

Then you could try something like

df = pd.DataFrame({e: [e in s.split(',') for s in df['My_Date']] for e in unique_elements})

#     Apple  Banana   Cat    Dog  Elephant
#  0   True    True  True  False     False
#  1   True    True  True  False      True
#  2  False    True  True  False     False
#  3   True    True  True   True     False

Or, if you insisted on having the name of the element as the values, then you could go for

import numpy as np

df = pd.DataFrame({e: [e if e in s.split(',') else np.nan for s in df['My_Date']] for e in unique_elements})

#     Apple  Banana  Cat  Dog  Elephant
#  0  Apple  Banana  Cat  NaN       NaN
#  1  Apple  Banana  Cat  NaN  Elephant
#  2    NaN  Banana  Cat  NaN       NaN
#  3  Apple  Banana  Cat  Dog       NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM