如何根据条件将 dataframe 列拆分为单独的列

Question

I am trying to split the following dataframe into separate columns.我正在尝试将以下 dataframe 拆分为单独的列。 I want all the text in one column and the numbers to be split on white space.我希望一列中的所有文本和数字在空白处拆分。

df[0].head(10)

0                                                   []
1               [Andaman and Nicobar, 194, 52, 142, 0]
2        [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3                [Arunachal Pradesh, 609, 431, 175, 3]
4                   [Assam, 20,646, 6,490, 14,105, 51]
5                  [Bihar, 23,589, 8,767, 14,621, 201]
6                      [Chandigarh, 660, 169, 480, 11]
7              [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8    [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9                          [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object

If I split just on white space and expand, though numbers are getting split correctly, the text is getting split into multiple columns.如果我只在空白处分割并展开，虽然数字被正确分割，但文本被分割成多列。 Since the text for different observations span different number of columns, I cannot concat them again.由于不同观察的文本跨越不同数量的列，我无法再次连接它们。 Obviously, the solution is writing the right 'regex' and splitting on it.显然，解决方案是编写正确的“正则表达式”并对其进行拆分。 I am unable to figure out the regex required, hence request inputs.我无法弄清楚所需的正则表达式，因此请求输入。

df1 = df[0].str.split(' ', expand= True)
df1.head(10)
    0   1   2   3   4   5   6   7   8   9
0   []  None    None    None    None    None    None    None    None    None
1   [Andaman    and     Nicobar,    194,    52,     142,    0]  None    None    None
2   [Andhra     Pradesh,    40,646,     19,814,     20,298,     534]    None    None    None    None
3   [Arunachal  Pradesh,    609,    431,    175,    3]  None    None    None    None
4   [Assam,     20,646,     6,490,  14,105,     51]     None    None    None    None    None
5   [Bihar,     23,589,     8,767,  14,621,     201]    None    None    None    None    None
6   [Chandigarh,    660,    169,    480,    11]     None    None    None    None    None
7   [Chhattisgarh,  4,964,  1,429,  3,512,  23]     None    None    None    None    None
8   [Dadra  and     Nagar   Haveli  and     Daman,  585,    182,    401,    2]
9   [Daman  and     Diu,    0,  0,  0,  0]  None    None    None

The result I am expecting shall be like this:我期待的结果应该是这样的：

        0                                   1       2       3       4       5       6       7       8       9
    0   []                                  None    None    None    None    None    None    None    None    None
    1   [Andaman and Nicobar,               194,    52,     142,    0]      None    None    None    None    None
    2   [Andhra Pradesh,                    40,646, 19,814, 20,298, 534]    None    None    None    None    None
    3   [Arunachal Pradesh,                 609,    431,    175,    3]      None    None    None    None    None
    4   [Assam,                             20,646, 6,490,  14,105, 51]     None    None    None    None    None
    5   [Bihar,                             23,589, 8,767,  14,621, 201]    None    None    None    None    None
    6   [Chandigarh,                        660,    169,    480,    11]     None    None    None    None    None
    7   [Chhattisgarh,                      4,964,  1,429,  3,512,  23]     None    None    None    None    None
    8   [Dadra and Nagar Haveli and Daman,  585,    182,    401,    2]      None    None    None    None    None
    9   [Daman and Diu,                     0,      0,      0,      0]      None    None    None    None    None

Answer 1

you could use str.replace and str.extract to re-shape your dataframe.您可以使用str.replace和str.extract重新塑造您的 dataframe。

names = df[0].str.extract('(\D+)').replace('\[|,','',regex=True).rename(columns={0 : 'names'})


df_new = names.join(df[0].str.replace('\D+,','').str.strip(']').str.split(' ',expand=True))

print(df_new)

                                  names 0        1        2        3     4
0                   Andaman and Nicobar       194,      52,     142,     0
1                        Andhra Pradesh    40,646,  19,814,  20,298,   534
2                     Arunachal Pradesh       609,     431,     175,     3
3                                 Assam    20,646,   6,490,  14,105,    51
4                                 Bihar    23,589,   8,767,  14,621,   201
5                            Chandigarh       660,     169,     480,    11
6                          Chhattisgarh     4,964,   1,429,   3,512,    23
7      Dadra and Nagar Haveli and Daman       585,     182,     4...  None
8                         Daman and Diu         0,       0,       0,     0

如何根据条件将 dataframe 列拆分为单独的列

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-07-18 11:56:06

如何根据条件将 dataframe 列拆分为单独的列

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-07-18 11:56:06

解决方案1
3 已采纳 2020-07-18 11:56:06