简体   繁体   English

将dataframe一列拆分成多列

[英]Splitting a column of dataframe into multiple columns

I have a df as shown containing option data as follows:我有一个 df 如图所示,其中包含如下选项数据:

    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I

Now in the above dataframe some of the rows are seperated by spaces into 4 parts.现在在上面的 dataframe 中,一些行用空格分隔成 4 部分。 Others are singular words其他都是单数

I intend to do the following:我打算做以下事情:

  1. Rows in a column which are seperated into 4 parts by spaces between them should be seperated into 4 individual column where each column contains one part一列中的行由它们之间的空格分隔为 4 个部分,应分隔为 4 个单独的列,其中每列包含一个部分

eg: AARTIIND 29APR21 1100 PE Should be splitted into 4 columns wherein column one will contain AARTIIND,column2 will contain the date, column3 will contain the price, column 4 will contain the type of option ie PE例如:AARTIIND 29APR21 1100 PE 应该分成 4 列,其中第一列将包含 AARTIIND,第二列将包含日期,第三列将包含价格,第四列将包含选项类型,即 PE

  1. Singular words which are not seperated should be inserted in column 1 while in the other columns we should put NA未分隔的单数词应插入第 1 列,而在其他列中我们应放置 NA

eg: AARTIIND-I is singular hence column 1 will contain AARTIIND-I while column2,3,4 will display NA例如:AARTIIND-I 是单数,因此第 1 列将包含 AARTIIND-I,而第 2、3、4 列将显示 NA

Hence after the transformation the Final df should be displayed as:因此,转换后的最终 df 应显示为:

A           B           C           D
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA

To split the strings using white spaces I use:要使用空格分割字符串,我使用:

new_df[['A', 'B', 'C', 'D']] = new_df.ABCD.str.split(expand=True)

But since the spacing is not consistent it gives me an error:但由于间距不一致,它给了我一个错误:

C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\python.exe "C:/Users/sadik/PycharmProjects/Katwal_Asset_Management/import data.py"
Traceback (most recent call last):
  File "C:\Users\sadik\PycharmProjects\Katwal_Asset_Management\import data.py", line 6, in <module>
    df[['A', 'B', 'C', 'D']] = df.ABCD.str.split(expand=True)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3600, in __setitem__
    self._setitem_array(key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3639, in _setitem_array
    check_key_length(self.columns, key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\indexers.py", line 428, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

So is there any way I can accomplish the above task using str.split or is there any other method in python to achieve the desired output那么有什么方法可以使用 str.split 完成上述任务,或者 python 中是否有任何其他方法可以实现所需的 output

Try the following code:试试下面的代码:

import io
import pandas as pd

text = 
"""    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I"""

df = pd.read_csv(io.StringIO(text))

df = df['    ABCD'].str.split(' ', expand=True)

df.columns = ['A','B','C','D','E','F','G']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM