简体   繁体   English

将由空格分隔的一列值拆分为python中每个值的单独列

[英]Split a column of values delimited by a space into separate columns for each value in python

How can I convert the dataset 如何转换数据集

a    |    a b c d 
s    |    e f g h
f    |    i j k l

to

a | a | b | c | d
s | e | f | g | h
f | i | j | k | l

Using @chrisz setup 使用@chrisz设置

df.set_index('col1')['col2'].str.extractall('(\w+)')[0].unstack()

Output: 输出:

match  0  1  2  3
col1             
a      a  b  c  d
f      i  j  k  l
s      e  f  g  h

A simpler way is using expand=True argument. 一种更简单的方法是使用expand=True参数。

# sample data
df = pd.DataFrame({'c1':['a','b','c'], 'c2':['a b c d','e f g h','i j k l']})

# transform into multiple columns
df = pd.concat([df['c1'],df['c2'].str.split(' ', expand=True)], axis=1)

print(df)

  c1  0  1  2  3
0  a  a  b  c  d
1  b  e  f  g  h
2  c  i  j  k  l

Assuming your data really looks like this: 假设您的数据看起来像这样:

  col1     col2
0    a  a b c d
1    s  e f g h
2    f  i j k l

join with findall join findall

df.join(pd.DataFrame(df.col2.str.findall(r'\w+').values.tolist())).drop('col2', 1)

  col1  0  1  2  3
0    a  a  b  c  d
1    s  e  f  g  h
2    f  i  j  k  l

Consider this df 考虑一下这个df

df = pd.DataFrame({'col1':[1,2], 'col2': ['10 20 30 40', '56 76 554 3243']})

    col1    col2
0   1       10 20 30 40
1   2       56 76 554 3243

You can split the integers on col2 using str.split. 您可以使用str.split在col2上拆分整数。 You can either manually assign the resulting columns or use range as follows. 您可以手动分配结果列或使用范围,如下所示。 I used the example with range as you mentioned in the comment that you are looking at 99ish columns in all. 我使用了你在注释中提到的范围示例,你正在查看99个专栏。

cols = np.arange(df.col2.str.split(expand = True).shape[1])
df[cols] = df.col2.str.split(expand = True)

You get 你得到

    col1    col2            0   1   2   3
0   1       10 20 30 40     10  20  30  40
1   2       56 76 554 3243  56  76  554 3243

Most compact 最紧凑

df.drop('c2', 1).join(df.c2.str.split(expand=True))

  c1  0  1  2  3
0  a  a  b  c  d
1  b  e  f  g  h
2  c  i  j  k  l

Disregarding existing columns 1 忽略现有的专栏1

pd.DataFrame([[a] + b.split() for a, b in df.values])

   0  1  2  3  4
0  a  a  b  c  d
1  b  e  f  g  h
2  c  i  j  k  l

Disregarding existing columns 2 忽视现有的专栏2

pd.DataFrame([' '.join(r).split() for r in df.values])

   0  1  2  3  4
0  a  a  b  c  d
1  b  e  f  g  h
2  c  i  j  k  l

If each row on that dataset is delimited by a new line character, you can do something like this: 如果该数据集上的每一行都由换行符分隔,则可以执行以下操作:

dataset = '''
a    |    a b c d 
s    |    e f g h
f    |    i j k l
'''
for row in dataset.splitlines():
    print('{} {} {} | {} | {} | {}'.format(*row.split()))

And the result will be what you expected. 结果将是你所期望的。

a | a | b | c | d
s | e | f | g | h
f | i | j | k | l

Assuming the input is in the form of a string, we can do 假设输入是字符串形式,我们可以这样做

import re
s = "a    |    a b c d"
s = re.sub("\s+[^a-z]"," ",s) # Replacing all non-alphabet characters with a single space
s = re.sub(" ","|",s)

This should give you the desired output. 这应该给你想要的输出。 Since pandas' replace is made on top of standard python re.sub this information should work well for you. 由于pandas的替换是在标准python re.sub之上进行的,因此这些信息应该适合您。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 拆分管道分隔的系列,按单独的系列分组,并在新列中返回每个拆分值的计数 - Split a pipe-delimited series, groupby a separate series, and return the counts of each split value in new columns Python Pandas 将列字符串值拆分为单独的列 - Python Pandas Split Column String Values into Separate Columns Python / Pandas:将单列中的美元值拆分为单独的列 - Python / Pandas: Split Dollar Values in a Single Column to Separate Columns 大熊猫将列值拆分为单独的列 - pandas split column values into separate columns 如何将 SQLite 列中的分隔值拆分为多列 - How to split delimited values in a SQLite column into multiple columns 如何将带有字典列表的 pandas 列拆分为每个键的单独列 - How to split a pandas column with a list of dicts into separate columns for each key 在字符串中找到一个空格,然后在 python 中拆分为数字和文本到单独的列中 - Find a space in a string and then split into number and text into separate column in python 将分隔列拆分为 pyspark dataframe 中的新列 - split delimited column into new columns in pyspark dataframe 如何将数据框列中的多个值拆分为单独的列 - How to split multiple values from a dataframe column into separate columns 如何将具有 x 个不同值的一列拆分为 2 列(每列只有一个值)和几行 - How to split one column with x different values into 2 columns (with only one value each) and several rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM