简体   繁体   English

需要将pandas dataframe列中的可变长度数据拆分为多个列

[英]Need to split variable length data in a pandas dataframe column into multiple columns

I have 2 column dataframe likes this: 我有这样的2列数据框:

ITEM        REFNUMS
1   00000299    0036701923024762922029229294652954429569295832...
2   00000655    NaN
24  00001791    00016027123076000158004563065131972
25  00001805    00016027123076000158004563065131972
26  00001813    00016027123076000158004563065131972
27  00001821    00016027123076000158004563065131972
28  00001937    0142530521316303164702509000510012201310027820...

I would like to split the REFNUMS columns into divisible parts and add onto the existing dataframe if possible as I need to retain the row index and matching ITEM #. 我想将REFNUMS列拆分为可分割的部分,并在可能的情况下添加到现有数据帧中,因为我需要保留行索引和匹配的ITEM#。 The data in REFNUMS is a length divisible by 5 , when not NaN , so for example Row 1 is = 78 sets of 5. REFNUMS的数据的长度可以被5整除,而不是NaN ,因此例如行1 = 78组5。

data_len = (data['REFNUMS'].str.len())/5 

Then 然后

0         NaN
1        78.0
2         NaN

Appreciate any suggestions on how to do this. 感谢有关如何执行此操作的任何建议。

IIUC, you can use str.extractall to get the groups of 5 digits, clean up the columns, and then join: IIUC,您可以使用str.extractall获取5位数字的组,清理列,然后加入:

In [168]: r = df.REFNUMS.str.extractall("(\d{1,5})").unstack()

In [169]: r.columns = r.columns.droplevel(0)

In [170]: df.join(r)
Out[170]: 
    ITEM                                            REFNUMS      0      1      2      3      4      5      6      7      8     9
1    299  0036701923024762922029229294652954429569295832...  00367  01923  02476  29220  29229  29465  29544  29569  29583     2
2    655                                                NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   NaN
24  1791                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
25  1805                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
26  1813                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
27  1821                00016027123076000158004563065131972  00016  02712  30760  00158  00456  30651  31972   None   None  None
28  1937  0142530521316303164702509000510012201310027820...  01425  30521  31630  31647  02509  00051  00122  01310  02782     0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果列是可变长度的字符串,如何将 Pandas DataFrame 列拆分为多列? - How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length? 将Pandas数据框列中的列表拆分为多列 - Split list in Pandas dataframe column into multiple columns Pandas DataFrame 中的拆分列(定界); 列长度​​与键错误相同 - Split Column (Delimit) in Pandas DataFrame ; columns same length as key error 根据Python上列的文本长度将数据框的列拆分为多列 - Split the column of dataframe into multiple columns according to the text length of column on Python 如何将数据从 Pandas 数据帧的一列拆分为新数据帧的多列 - How do I split data out from one column of a pandas dataframe into multiple columns of a new dataframe 如何将可变大小的基于字符串的列拆分为 Pandas DataFrame 中的多列? - How to split variable sized string-based column into multiple columns in Pandas DataFrame? Pandas 将一列长度不等的列表拆分为多个布尔列 - Pandas split a column of unequal length lists into multiple boolean columns 熊猫将(不等长)列表的列分成多列 - Pandas split column of (unequal length) list into multiple columns Pandas:将长度不等的列表列拆分为多列 - Pandas: split column of lists of unequal length into multiple columns Pandas 将(不等长度)列表的列拆分为 python 中的多个列 - Pandas split column of (unequal length) list into multiple columns in python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM