[英]Splitting a column of dataframe into multiple columns
I have a df as shown containing option data as follows:我有一个 df 如图所示,其中包含如下选项数据:
ABCD
1 AARTIIND 29APR21 1100 PE
2 AARTIIND 29APR21 1100 PE
3 AARTIIND 29APR21 1100 PE
4 AARTIIND-I
5 AARTIIND-I
6 AARTIIND-I
7 AARTIIND-I
8 AARTIIND-I
9 AARTIIND-I
10 AARTIIND-I
11 AARTIIND-I
12 AARTIIND-I
13 AARTIIND-I
14 AARTIIND-I
15 AARTIIND-I
16 AARTIIND-I
17 AARTIIND-I
18 AARTIIND-I
Now in the above dataframe some of the rows are seperated by spaces into 4 parts.现在在上面的 dataframe 中,一些行用空格分隔成 4 部分。 Others are singular words
其他都是单数
I intend to do the following:我打算做以下事情:
eg: AARTIIND 29APR21 1100 PE Should be splitted into 4 columns wherein column one will contain AARTIIND,column2 will contain the date, column3 will contain the price, column 4 will contain the type of option ie PE例如:AARTIIND 29APR21 1100 PE 应该分成 4 列,其中第一列将包含 AARTIIND,第二列将包含日期,第三列将包含价格,第四列将包含选项类型,即 PE
eg: AARTIIND-I is singular hence column 1 will contain AARTIIND-I while column2,3,4 will display NA例如:AARTIIND-I 是单数,因此第 1 列将包含 AARTIIND-I,而第 2、3、4 列将显示 NA
Hence after the transformation the Final df should be displayed as:因此,转换后的最终 df 应显示为:
A B C D
AARTIIND 29-Apr-21 1100 PE
AARTIIND 29-Apr-21 1100 PE
AARTIIND 29-Apr-21 1100 PE
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
To split the strings using white spaces I use:要使用空格分割字符串,我使用:
new_df[['A', 'B', 'C', 'D']] = new_df.ABCD.str.split(expand=True)
But since the spacing is not consistent it gives me an error:但由于间距不一致,它给了我一个错误:
C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\python.exe "C:/Users/sadik/PycharmProjects/Katwal_Asset_Management/import data.py"
Traceback (most recent call last):
File "C:\Users\sadik\PycharmProjects\Katwal_Asset_Management\import data.py", line 6, in <module>
df[['A', 'B', 'C', 'D']] = df.ABCD.str.split(expand=True)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3600, in __setitem__
self._setitem_array(key, value)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3639, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\indexers.py", line 428, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
So is there any way I can accomplish the above task using str.split or is there any other method in python to achieve the desired output那么有什么方法可以使用 str.split 完成上述任务,或者 python 中是否有任何其他方法可以实现所需的 output
Try the following code:试试下面的代码:
import io
import pandas as pd
text =
""" ABCD
1 AARTIIND 29APR21 1100 PE
2 AARTIIND 29APR21 1100 PE
3 AARTIIND 29APR21 1100 PE
4 AARTIIND-I
5 AARTIIND-I
6 AARTIIND-I
7 AARTIIND-I
8 AARTIIND-I
9 AARTIIND-I
10 AARTIIND-I
11 AARTIIND-I
12 AARTIIND-I
13 AARTIIND-I
14 AARTIIND-I
15 AARTIIND-I
16 AARTIIND-I
17 AARTIIND-I
18 AARTIIND-I"""
df = pd.read_csv(io.StringIO(text))
df = df[' ABCD'].str.split(' ', expand=True)
df.columns = ['A','B','C','D','E','F','G']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.