简体   繁体   English

通过分隔符拆分熊猫数据框中的多列

[英]split multiple columns in pandas dataframe by delimiter

I have survey data which annoying has returned multiple choice questions in the following way.我有令人讨厌的调查数据,这些数据通过以下方式返回了多项选择题。 It's in an excel sheet There is about 60 columns with responses from single to multiple that are split by /.它在一个 excel 表中大约有 60 列,响应从单个到多个被 / 分割。 This is what I have so far, is there any way to do this quicker without having to do this for each individual column这是我到目前为止所拥有的,有没有办法更快地做到这一点,而不必为每个单独的列做这件事

data = {'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']}

df = pd.DataFrame(data)

df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)

clean_df = df.drop(df[['q2', 'q3']], axis=1)

We can use list comprehension with add_prefix , then we use pd.concat to concatenate everything to your final df:我们可以将列表add_prefixadd_prefix一起add_prefix ,然后我们使用pd.concat将所有内容连接到您的最终 df:

splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)
     q10  q20  q21    q22 q30 q31 q32
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

If you actually want your column names to be suffixed by a letter, you can do the following with string.ascii_lowercase :如果您确实希望您的列名以字母为后缀,您可以使用string.ascii_lowercase执行以下string.ascii_lowercase

from string import ascii_lowercase

dfs = []
for col in df.columns:
    d = df[col].str.split('/', expand=True)
    c = d.shape[1]
    d.columns = [col + l for l in ascii_lowercase[:c]]
    dfs.append(d)
    
clean_df = pd.concat(dfs, axis=1)
     q1a  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

You can create a dict d that transforms numbers to letters.您可以创建一个 dict d将数字转换为字母。 Then loop through the columns and dynamically change their names:然后遍历列并动态更改它们的名称:

input:输入:

import pandas as pd
df = pd.DataFrame({'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']})

code:代码:

ltrs = list('abcdefghijklmonpqrstuvwxyz')
nmbrs = [i[0] for i in enumerate(ltrs)]
d = dict(zip(nmbrs, ltrs)) 

cols = df.columns[1:]
for col in cols:
    df1 = df[col].str.split('/', expand = True)
    df1.columns = df1.columns.map(d)
    df1 = df1.add_prefix(f'{col}')
    df = pd.concat([df,df1], axis=1)
df = df.drop(cols, axis=1)
df

output:输出:

Out[1]: 
      q1  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM