简体   繁体   English

以逗号分隔值的大熊猫分隔列,但保持顺序

[英]Split column in pandas of comma separated values but maintining the order

I have the following column in a dataframe: 我在数据框中有以下列:

column_1
en-us,en-en
pr,en-us,en-en,br
ar-ar,pr,en-en

I want to Split that column (this can be done with .str.split) but using .Split I will get: 我想拆分该列(可以通过.str.split完成),但是使用.Split,我将得到:

column_1 | column_2 | column_3 | column_4
en-us      en-en
pr         en-us      en-en      br
ar-ar      pr         en-en

And what I need is: 我需要的是:

column_1 | column_2 | column_3 | column_4
en-us      en-en      
en-us      en-en      br         pr
ar-ar      en-en                 pr

Is there any automatic way of doing this? 有自动的方法吗?

IIUC, you can do by passing a list of dictionaries to the default pd.DataFrame constructor. IIUC,您可以通过将字典列表传递给默认的pd.DataFrame构造函数来完成。 For example, 例如,

df = pd.DataFrame(s.str.split(',').transform(lambda x: {k:k for k in x}).tolist())

yields 产量

    r-ar    br  en-en   en-us   pr
0   NaN     NaN en-en   en-us   NaN
1   NaN     br  en-en   en-us   pr
2   ar-ar   NaN en-en   NaN     pr

Notice that it is trivial to reorder the data frame according to your needs, eg 请注意,根据您的需求重新排列数据框很简单,例如

>>> df[['en-en', 'en-us', 'br', 'pr']]
    en-en   en-us   br  pr
0   en-en   en-us   NaN NaN
1   en-en   en-us   br  pr
2   en-en   NaN     NaN pr

And if you want to have empty strings rather than NaN s, just use .fillna() 如果要使用空字符串而不是NaN ,则只需使用.fillna()

df[['en-en', 'en-us', 'br', 'pr']].fillna('')

    en-en   en-us   br  pr
0   en-en   en-us       
1   en-en   en-us   br  pr
2   en-en           pr

Explanation 说明

Let's break down the following statement 让我们分解以下语句

s.str.split(',').transform(lambda x: {k:k for k in x}).tolist()

First of all, s.str.split(',') does what you already know: splits using , as separator. 首先, s.str.split(',')做您已经知道的事情:使用,作为分隔符进行拆分。 This yields the following series 这产生了以下系列

0            [en-us, en-en]
1    [pr, en-us, en-en, br]
2        [ar-ar, pr, en-en]
Name: col1, dtype: object

Now, we want to change each of these elements into a {key:value} structure. 现在,我们想将每个元素更改为{key:value}结构。 For that, we use transform passing a function to it: 为此,我们使用transform函数传递给它的transform

s.str.split(',').transform(function)

where function = lambda x: {k:k for k in x} . 其中function = lambda x: {k:k for k in x} So basically we will run this func for the input [en-us, en-en] , then for [pr, en-us, en-en, br] , etc. The output of this function is 因此,基本上,我们将针对输入[en-us, en-en]运行func ,然后针对[pr, en-us, en-en, br]等运行此函数。此函数的输出为

0                 {'en-en': 'en-en', 'en-us': 'en-us'}
1    {'br': 'br', 'en-en': 'en-en', 'en-us': 'en-us...
2     {'en-en': 'en-en', 'ar-ar': 'ar-ar', 'pr': 'pr'}

Now, we just use tolist() to get a list of these values, and input that in the pd.DataFrame() constructor. 现在,我们仅使用tolist()获取这些值的列表,并将其输入到pd.DataFrame()构造函数中。 The constructor knows how to deal with lists of dictionaries pretty well, and it assigns values based on the keys of the dictionaries for each row. 构造函数知道如何很好地处理字典列表,并且它基于字典的keys为每一行分配值。 Whenever no key/value is found for a row, it just uses NaN s 每当找不到行的键/值时,它仅使用NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 包含对象列表的pandas列,根据键名拆分此列,并将值存储为逗号分隔的值 - pandas column containing list of objects, split this column based upon keynames and store values as comma separated values 如何在Pandas列中拆分逗号分隔的单词列表? - How can I split a list of comma separated words in a Pandas column? 根据 pandas 中的特定条件拆分以逗号分隔的列 - Split a column which is separated by comma based on certain condition in pandas Pandas 删除逗号分隔列值中的特定 int 值 - Pandas remove particular int values in comma separated column values groupby逗号分隔值在单个DataFrame列python / pandas中 - groupby comma-separated values in single DataFrame column python/pandas Pandas按组中所有值的总和与另一列以逗号分隔 - Pandas Group by sum of all the values of the group and another column as comma separated 如何在 pandas 的单个列中合并(逗号分隔的)行值? - How to combine (comma-separated) row values in a single column in pandas? 如何在新的列熊猫数据框中获取逗号分隔的值? - How to get comma separated values in new column pandas dataframe? 如何用逗号在CSV中给逗号分隔的值添加一个新列? - How to give comma separated values a new column in csv with pandas? Python Pandas为逗号分隔的值提供新列 - Python pandas give comma separated values new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM