[英]How to assign specific substring to column in Python or Pandas
I am relatively new to Python.我对 Python 比较陌生。 I am trying to understand how I can breakdown a column, by extracting substrings, and the assigning the substrings to a specific column.
我试图了解如何通过提取子字符串并将子字符串分配给特定列来分解列。 Please see below on what I want to do:
请参阅下面关于我想做的事情:
Output that I want:我想要的输出:
So far I have used the following code to break down one column into multiple columns, but this simply breaks down the string into multiple columns but not in the specific order I want.到目前为止,我已经使用以下代码将一列分解为多列,但这只是将字符串分解为多列,而不是按照我想要的特定顺序。 Is there a way to do this with Python?
有没有办法用 Python 做到这一点?
my_ingredients = my_ingredients.str.split(',',expand = True)
my_ingredients.head()
The output I am getting now is as shown below:我现在得到的输出如下所示:
Wrong output which I don't want:我不想要的错误输出:
Any suggestions on how I can do this with Python?关于如何使用 Python 执行此操作的任何建议?
Thank you!谢谢!
If the dataframe looks like如果数据框看起来像
import pandas as pd
df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple","Apple,Banana,Cat","Cat,Apple,Banana"]})
# My_Date
# 0 Apple,Cat,Banana
# 1 Banana,Cat,Apple
# 2 Apple,Banana,Cat
# 3 Cat,Apple,Banana
Then maybe那么也许
df = df['My_Date'].apply(lambda x: pd.Series(sorted(x.split(','))))
# 0 1 2
# 0 Apple Banana Cat
# 1 Apple Banana Cat
# 2 Apple Banana Cat
# 3 Apple Banana Cat
is what you're looking for.就是你要找的。 It simply sorts the elements alphabetically.
它只是按字母顺序对元素进行排序。
Note, though, that this solution will not place the columns in the right place if there are rows in the original dataframe that aren't a permutation of those three elements.但请注意,如果原始数据框中的行不是这三个元素的排列,则此解决方案不会将列放置在正确的位置。 I would imagine that for practical purposes, you might want comma separated values that have varying elements and size.
我想,出于实际目的,您可能需要具有不同元素和大小的逗号分隔值。 In this case, there may be columns which have elements that are empty.
在这种情况下,可能存在包含空元素的列。 If instead you have a dataframe that looks more like
相反,如果您有一个看起来更像的数据框
df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple,","Banana,Cat","Cat,Dog,Apple,Banana"]})
# My_Date
# 0 Apple,Cat,Banana
# 1 Banana,Cat,Apple,Elephant
# 2 Banana,Cat
# 3 Cat,Dog,Apple,Banana
Then you could try something like然后你可以尝试类似的东西
df = pd.DataFrame({e: [e in s.split(',') for s in df['My_Date']] for e in unique_elements})
# Apple Banana Cat Dog Elephant
# 0 True True True False False
# 1 True True True False True
# 2 False True True False False
# 3 True True True True False
Or, if you insisted on having the name of the element as the values, then you could go for或者,如果您坚持将元素的名称作为值,那么您可以选择
import numpy as np
df = pd.DataFrame({e: [e if e in s.split(',') else np.nan for s in df['My_Date']] for e in unique_elements})
# Apple Banana Cat Dog Elephant
# 0 Apple Banana Cat NaN NaN
# 1 Apple Banana Cat NaN Elephant
# 2 NaN Banana Cat NaN NaN
# 3 Apple Banana Cat Dog NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.