简体   繁体   English

如何根据条件将 dataframe 列拆分为单独的列

[英]How to split dataframe column into separate columns based on condition

I am trying to split the following dataframe into separate columns.我正在尝试将以下 dataframe 拆分为单独的列。 I want all the text in one column and the numbers to be split on white space.我希望一列中的所有文本和数字在空白处拆分。

df[0].head(10)

0                                                   []
1               [Andaman and Nicobar, 194, 52, 142, 0]
2        [Andhra Pradesh, 40,646, 19,814, 20,298, 534]
3                [Arunachal Pradesh, 609, 431, 175, 3]
4                   [Assam, 20,646, 6,490, 14,105, 51]
5                  [Bihar, 23,589, 8,767, 14,621, 201]
6                      [Chandigarh, 660, 169, 480, 11]
7              [Chhattisgarh, 4,964, 1,429, 3,512, 23]
8    [Dadra and Nagar Haveli and Daman, 585, 182, 4...
9                          [Daman and Diu, 0, 0, 0, 0]
Name: 0, dtype: object

If I split just on white space and expand, though numbers are getting split correctly, the text is getting split into multiple columns.如果我只在空白处分割并展开,虽然数字被正确分割,但文本被分割成多列。 Since the text for different observations span different number of columns, I cannot concat them again.由于不同观察的文本跨越不同数量的列,我无法再次连接它们。 Obviously, the solution is writing the right 'regex' and splitting on it.显然,解决方案是编写正确的“正则表达式”并对其进行拆分。 I am unable to figure out the regex required, hence request inputs.我无法弄清楚所需的正则表达式,因此请求输入。

df1 = df[0].str.split(' ', expand= True)
df1.head(10)
    0   1   2   3   4   5   6   7   8   9
0   []  None    None    None    None    None    None    None    None    None
1   [Andaman    and     Nicobar,    194,    52,     142,    0]  None    None    None
2   [Andhra     Pradesh,    40,646,     19,814,     20,298,     534]    None    None    None    None
3   [Arunachal  Pradesh,    609,    431,    175,    3]  None    None    None    None
4   [Assam,     20,646,     6,490,  14,105,     51]     None    None    None    None    None
5   [Bihar,     23,589,     8,767,  14,621,     201]    None    None    None    None    None
6   [Chandigarh,    660,    169,    480,    11]     None    None    None    None    None
7   [Chhattisgarh,  4,964,  1,429,  3,512,  23]     None    None    None    None    None
8   [Dadra  and     Nagar   Haveli  and     Daman,  585,    182,    401,    2]
9   [Daman  and     Diu,    0,  0,  0,  0]  None    None    None

The result I am expecting shall be like this:我期待的结果应该是这样的:

        0                                   1       2       3       4       5       6       7       8       9
    0   []                                  None    None    None    None    None    None    None    None    None
    1   [Andaman and Nicobar,               194,    52,     142,    0]      None    None    None    None    None
    2   [Andhra Pradesh,                    40,646, 19,814, 20,298, 534]    None    None    None    None    None
    3   [Arunachal Pradesh,                 609,    431,    175,    3]      None    None    None    None    None
    4   [Assam,                             20,646, 6,490,  14,105, 51]     None    None    None    None    None
    5   [Bihar,                             23,589, 8,767,  14,621, 201]    None    None    None    None    None
    6   [Chandigarh,                        660,    169,    480,    11]     None    None    None    None    None
    7   [Chhattisgarh,                      4,964,  1,429,  3,512,  23]     None    None    None    None    None
    8   [Dadra and Nagar Haveli and Daman,  585,    182,    401,    2]      None    None    None    None    None
    9   [Daman and Diu,                     0,      0,      0,      0]      None    None    None    None    None

you could use str.replace and str.extract to re-shape your dataframe.您可以使用str.replacestr.extract重新塑造您的 dataframe。

names = df[0].str.extract('(\D+)').replace('\[|,','',regex=True).rename(columns={0 : 'names'})


df_new = names.join(df[0].str.replace('\D+,','').str.strip(']').str.split(' ',expand=True))

print(df_new)

                                  names 0        1        2        3     4
0                   Andaman and Nicobar       194,      52,     142,     0
1                        Andhra Pradesh    40,646,  19,814,  20,298,   534
2                     Arunachal Pradesh       609,     431,     175,     3
3                                 Assam    20,646,   6,490,  14,105,    51
4                                 Bihar    23,589,   8,767,  14,621,   201
5                            Chandigarh       660,     169,     480,    11
6                          Chhattisgarh     4,964,   1,429,   3,512,    23
7      Dadra and Nagar Haveli and Daman       585,     182,     4...  None
8                         Daman and Diu         0,       0,       0,     0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据条件将数据框列拆分为不同的列 - Split dataframe column into different columns based on condition 如何将数据框列中的多个值拆分为单独的列 - How to split multiple values from a dataframe column into separate columns 将单列拆分为 Dataframe 中的 4 个不同的单独列 - Split the single column to 4 different separate columns in Dataframe 根据分隔符字符串将列拆分为单独的列 - Split column into separate columns based on separator strings 根据空列条件拆分 dataframe - Split dataframe based on empty column condition 熊猫根据条件将一列分成两列 - Panda split a column into 2 columns based on condition 如何根据列条件将选定的列从数据框中复制到另一个 - How to copy selected columns from a dataframe to another based on a column condition 如何将'number'拆分为pandas DataFrame中的单独列 - how to split 'number' to separate columns in pandas DataFrame How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe - How to filter the rows of a dataframe based on the presence of the column values in a separate dataframe and append columns from the second dataframe 如何根据条件在 python 中的列上拆分 74 行和 3234 列 dataframe - How to split a 74 rows and 3234 columns dataframe on columns in python based on condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM