I am relatively new to Python. I am trying to understand how I can breakdown a column, by extracting substrings, and the assigning the substrings to a specific column. Please see below on what I want to do:
Output that I want:
So far I have used the following code to break down one column into multiple columns, but this simply breaks down the string into multiple columns but not in the specific order I want. Is there a way to do this with Python?
my_ingredients = my_ingredients.str.split(',',expand = True)
my_ingredients.head()
The output I am getting now is as shown below:
Wrong output which I don't want:
Any suggestions on how I can do this with Python?
Thank you!
If the dataframe looks like
import pandas as pd
df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple","Apple,Banana,Cat","Cat,Apple,Banana"]})
# My_Date
# 0 Apple,Cat,Banana
# 1 Banana,Cat,Apple
# 2 Apple,Banana,Cat
# 3 Cat,Apple,Banana
Then maybe
df = df['My_Date'].apply(lambda x: pd.Series(sorted(x.split(','))))
# 0 1 2
# 0 Apple Banana Cat
# 1 Apple Banana Cat
# 2 Apple Banana Cat
# 3 Apple Banana Cat
is what you're looking for. It simply sorts the elements alphabetically.
Note, though, that this solution will not place the columns in the right place if there are rows in the original dataframe that aren't a permutation of those three elements. I would imagine that for practical purposes, you might want comma separated values that have varying elements and size. In this case, there may be columns which have elements that are empty. If instead you have a dataframe that looks more like
df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple,","Banana,Cat","Cat,Dog,Apple,Banana"]})
# My_Date
# 0 Apple,Cat,Banana
# 1 Banana,Cat,Apple,Elephant
# 2 Banana,Cat
# 3 Cat,Dog,Apple,Banana
Then you could try something like
df = pd.DataFrame({e: [e in s.split(',') for s in df['My_Date']] for e in unique_elements})
# Apple Banana Cat Dog Elephant
# 0 True True True False False
# 1 True True True False True
# 2 False True True False False
# 3 True True True True False
Or, if you insisted on having the name of the element as the values, then you could go for
import numpy as np
df = pd.DataFrame({e: [e if e in s.split(',') else np.nan for s in df['My_Date']] for e in unique_elements})
# Apple Banana Cat Dog Elephant
# 0 Apple Banana Cat NaN NaN
# 1 Apple Banana Cat NaN Elephant
# 2 NaN Banana Cat NaN NaN
# 3 Apple Banana Cat Dog NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.