[英]How to split dataframe column into separate columns based on condition
I am trying to split the following dataframe into separate columns.我正在尝试将以下 dataframe 拆分为单独的列。 I want all the text in one column and the numbers to be split on white space.
我希望一列中的所有文本和数字在空白处拆分。
df[0].head(10)
0 []
1 [Andaman and Nicobar, 194, 52, 142, 0]
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3 [Arunachal Pradesh, 609, 431, 175, 3]
4 [Assam, 20,646, 6,490, 14,105, 51]
5 [Bihar, 23,589, 8,767, 14,621, 201]
6 [Chandigarh, 660, 169, 480, 11]
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8 [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9 [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object
If I split just on white space and expand, though numbers are getting split correctly, the text is getting split into multiple columns.如果我只在空白处分割并展开,虽然数字被正确分割,但文本被分割成多列。 Since the text for different observations span different number of columns, I cannot concat them again.
由于不同观察的文本跨越不同数量的列,我无法再次连接它们。 Obviously, the solution is writing the right 'regex' and splitting on it.
显然,解决方案是编写正确的“正则表达式”并对其进行拆分。 I am unable to figure out the regex required, hence request inputs.
我无法弄清楚所需的正则表达式,因此请求输入。
df1 = df[0].str.split(' ', expand= True)
df1.head(10)
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2]
9 [Daman and Diu, 0, 0, 0, 0] None None None
The result I am expecting shall be like this:我期待的结果应该是这样的:
0 1 2 3 4 5 6 7 8 9
0 [] None None None None None None None None None
1 [Andaman and Nicobar, 194, 52, 142, 0] None None None None None
2 [Andhra Pradesh, 40,646, 19,814, 20,298, 534] None None None None None
3 [Arunachal Pradesh, 609, 431, 175, 3] None None None None None
4 [Assam, 20,646, 6,490, 14,105, 51] None None None None None
5 [Bihar, 23,589, 8,767, 14,621, 201] None None None None None
6 [Chandigarh, 660, 169, 480, 11] None None None None None
7 [Chhattisgarh, 4,964, 1,429, 3,512, 23] None None None None None
8 [Dadra and Nagar Haveli and Daman, 585, 182, 401, 2] None None None None None
9 [Daman and Diu, 0, 0, 0, 0] None None None None None
you could use str.replace
and str.extract
to re-shape your dataframe.您可以使用
str.replace
和str.extract
重新塑造您的 dataframe。
names = df[0].str.extract('(\D+)').replace('\[|,','',regex=True).rename(columns={0 : 'names'})
df_new = names.join(df[0].str.replace('\D+,','').str.strip(']').str.split(' ',expand=True))
print(df_new)
names 0 1 2 3 4
0 Andaman and Nicobar 194, 52, 142, 0
1 Andhra Pradesh 40,646, 19,814, 20,298, 534
2 Arunachal Pradesh 609, 431, 175, 3
3 Assam 20,646, 6,490, 14,105, 51
4 Bihar 23,589, 8,767, 14,621, 201
5 Chandigarh 660, 169, 480, 11
6 Chhattisgarh 4,964, 1,429, 3,512, 23
7 Dadra and Nagar Haveli and Daman 585, 182, 4... None
8 Daman and Diu 0, 0, 0, 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.