简体   繁体   English

如何将特定子字符串分配给 Python 或 Pandas 中的列

[英]How to assign specific substring to column in Python or Pandas

I am relatively new to Python.我对 Python 比较陌生。 I am trying to understand how I can breakdown a column, by extracting substrings, and the assigning the substrings to a specific column.我试图了解如何通过提取子字符串并将子字符串分配给特定列来分解列。 Please see below on what I want to do:请参阅下面关于我想做的事情:

Output that I want:我想要的输出:

在此处输入图片说明

So far I have used the following code to break down one column into multiple columns, but this simply breaks down the string into multiple columns but not in the specific order I want.到目前为止,我已经使用以下代码将一列分解为多列,但这只是将字符串分解为多列,而不是按照我想要的特定顺序。 Is there a way to do this with Python?有没有办法用 Python 做到这一点?

my_ingredients = my_ingredients.str.split(',',expand = True)
my_ingredients.head()

The output I am getting now is as shown below:我现在得到的输出如下所示:

Wrong output which I don't want:我不想要的错误输出:

在此处输入图片说明

Any suggestions on how I can do this with Python?关于如何使用 Python 执行此操作的任何建议?

Thank you!谢谢!

If the dataframe looks like如果数据框看起来像

import pandas as pd

df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple","Apple,Banana,Cat","Cat,Apple,Banana"]})

#              My_Date
#  0  Apple,Cat,Banana
#  1  Banana,Cat,Apple
#  2  Apple,Banana,Cat
#  3  Cat,Apple,Banana

Then maybe那么也许

df = df['My_Date'].apply(lambda x: pd.Series(sorted(x.split(','))))

#         0       1    2
#  0  Apple  Banana  Cat
#  1  Apple  Banana  Cat
#  2  Apple  Banana  Cat
#  3  Apple  Banana  Cat

is what you're looking for.就是你要找的。 It simply sorts the elements alphabetically.它只是按字母顺序对元素进行排序。


Note, though, that this solution will not place the columns in the right place if there are rows in the original dataframe that aren't a permutation of those three elements.但请注意,如果原始数据框中的行不是这三个元素的排列,则此解决方案不会将列放置在正确的位置。 I would imagine that for practical purposes, you might want comma separated values that have varying elements and size.我想,出于实际目的,您可能需要具有不同元素和大小的逗号分隔值。 In this case, there may be columns which have elements that are empty.在这种情况下,可能存在包含空元素的列。 If instead you have a dataframe that looks more like相反,如果您有一个看起来更像的数据框

df = pd.DataFrame({"My_Date":["Apple,Cat,Banana","Banana,Cat,Apple,","Banana,Cat","Cat,Dog,Apple,Banana"]})

#                       My_Date
#  0           Apple,Cat,Banana
#  1  Banana,Cat,Apple,Elephant
#  2                 Banana,Cat
#  3       Cat,Dog,Apple,Banana

Then you could try something like然后你可以尝试类似的东西

df = pd.DataFrame({e: [e in s.split(',') for s in df['My_Date']] for e in unique_elements})

#     Apple  Banana   Cat    Dog  Elephant
#  0   True    True  True  False     False
#  1   True    True  True  False      True
#  2  False    True  True  False     False
#  3   True    True  True   True     False

Or, if you insisted on having the name of the element as the values, then you could go for或者,如果您坚持将元素的名称作为值,那么您可以选择

import numpy as np

df = pd.DataFrame({e: [e if e in s.split(',') else np.nan for s in df['My_Date']] for e in unique_elements})

#     Apple  Banana  Cat  Dog  Elephant
#  0  Apple  Banana  Cat  NaN       NaN
#  1  Apple  Banana  Cat  NaN  Elephant
#  2    NaN  Banana  Cat  NaN       NaN
#  3  Apple  Banana  Cat  Dog       NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM