将dataframe一列拆分成多列

Question

I have a df as shown containing option data as follows:我有一个 df 如图所示，其中包含如下选项数据：

    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I

Now in the above dataframe some of the rows are seperated by spaces into 4 parts.现在在上面的 dataframe 中，一些行用空格分隔成 4 部分。 Others are singular words其他都是单数

I intend to do the following:我打算做以下事情：

Rows in a column which are seperated into 4 parts by spaces between them should be seperated into 4 individual column where each column contains one part一列中的行由它们之间的空格分隔为 4 个部分，应分隔为 4 个单独的列，其中每列包含一个部分

eg: AARTIIND 29APR21 1100 PE Should be splitted into 4 columns wherein column one will contain AARTIIND,column2 will contain the date, column3 will contain the price, column 4 will contain the type of option ie PE例如：AARTIIND 29APR21 1100 PE 应该分成 4 列，其中第一列将包含 AARTIIND，第二列将包含日期，第三列将包含价格，第四列将包含选项类型，即 PE

Singular words which are not seperated should be inserted in column 1 while in the other columns we should put NA未分隔的单数词应插入第 1 列，而在其他列中我们应放置 NA

eg: AARTIIND-I is singular hence column 1 will contain AARTIIND-I while column2,3,4 will display NA例如：AARTIIND-I 是单数，因此第 1 列将包含 AARTIIND-I，而第 2、3、4 列将显示 NA

Hence after the transformation the Final df should be displayed as:因此，转换后的最终 df 应显示为：

A           B           C           D
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA

To split the strings using white spaces I use:要使用空格分割字符串，我使用：

new_df[['A', 'B', 'C', 'D']] = new_df.ABCD.str.split(expand=True)

But since the spacing is not consistent it gives me an error:但由于间距不一致，它给了我一个错误：

C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\python.exe "C:/Users/sadik/PycharmProjects/Katwal_Asset_Management/import data.py"
Traceback (most recent call last):
  File "C:\Users\sadik\PycharmProjects\Katwal_Asset_Management\import data.py", line 6, in <module>
    df[['A', 'B', 'C', 'D']] = df.ABCD.str.split(expand=True)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3600, in __setitem__
    self._setitem_array(key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3639, in _setitem_array
    check_key_length(self.columns, key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\indexers.py", line 428, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

So is there any way I can accomplish the above task using str.split or is there any other method in python to achieve the desired output那么有什么方法可以使用 str.split 完成上述任务，或者 python 中是否有任何其他方法可以实现所需的 output

Answer 1

Try the following code:试试下面的代码：

import io
import pandas as pd

text = 
"""    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I"""

df = pd.read_csv(io.StringIO(text))

df = df['    ABCD'].str.split(' ', expand=True)

df.columns = ['A','B','C','D','E','F','G']

将dataframe一列拆分成多列

问题描述

1 个解决方案

解决方案1
0 2021-12-16 18:06:54

将dataframe一列拆分成多列

问题描述

1 个解决方案

解决方案1 0 2021-12-16 18:06:54

解决方案1
0 2021-12-16 18:06:54