简体   繁体   English

如何根据分隔符将csv的一个单元格拆分为数据框的列

[英]How to split one cell of csv into columns of dataframe based on separator

I have a csv file that all the data presented in on column, and I'd like to split the numerical data in that column into few columns .我有一个 csv 文件,所有数据都显示在列中,我想将该列中的数字数据分成几列。 The data I have (after reading to data frame) looks like that:我拥有的数据(读取数据框后)如下所示:

      0
0     13:25:09 -> mm [ -5,  4,  15 ] dd [ 4, 77, 8 ]
1     13:25:09 -> mm [ -4,  9,  10 ] dd [ 8, 6, 10 ]
2     13:25:09 -> mm [ 0,  -4,  19 ] dd [ 3, 1, 66 ]

How can I do it?我该怎么做?

I believe you need Series.str.extractall with Series.unstack :我相信你需要Series.str.extractallSeries.unstack

df = df[0].str.extractall('(\d+)')[0].unstack()
print (df)
match   0   1   2  3  4   5  6   7   8
0      13  25  09  5  4  15  4  77   8
1      13  25  09  4  9  10  8   6  10
2      13  25  09  0  4  19  3   1  66

Having this csv file有这个 csv 文件

csvfile = '''13:25:09 -> mm [ -5,  4,  15 ] dd [ 4, 77, 8 ]
13:25:09 -> mm [ -4,  9,  10 ] dd [ 8, 6, 10 ]
13:25:09 -> mm [ 0,  -4,  19 ] dd [ 3, 1, 66 ]'''

wrong result错误的结果

by doing通过做

import pandas as pd

lines = csvfile.split('\n')
df = pd.DataFrame(lines)

you get a wrong result:你得到一个错误的结果:

                                                0
0  13:25:09 -> mm [ -5,  4,  15 ] dd [ 4, 77, 8 ]
1  13:25:09 -> mm [ -4,  9,  10 ] dd [ 8, 6, 10 ]
2  13:25:09 -> mm [ 0,  -4,  19 ] dd [ 3, 1, 66 ]

nicer result更好的结果

You should do:你应该做:

import pandas as pd

lines = csvfile.split('\n')

df = pd.DataFrame({'id': [1,2,3], 
                   'time': [line[:8] for line in lines], 
                   'mm': [line[15:30] for line in lines],
                   'dd': [line[34:50] for line in lines]})

and you get你得到

   id      time               mm            dd
0   1  13:25:09  [ -5,  4,  15 ]  [ 4, 77, 8 ]
1   2  13:25:09  [ -4,  9,  10 ]  [ 8, 6, 10 ]
2   3  13:25:09  [ 0,  -4,  19 ]  [ 3, 1, 66 ]

what if I don't want strings but integers如果我不想要字符串而是整数怎么办

Note that mm is going to be a string请注意, mm将是一个字符串

print(type(df['mm'][0]))
<class 'str'>

It would be nice to have a list of integers有一个整数列表会很好

df['mm_list'] = df['mm'].str.replace('[', '').str.replace(']', '').str.split(',').values.tolist()
df['mm_list_int'] = [[int(i) for i in x] for x in df['mm_list']]
df

leads to a new column mm_list_int导致新列mm_list_int

   id      time               mm            dd            mm_list  mm_list_int
0   1  13:25:09  [ -5,  4,  15 ]  [ 4, 77, 8 ]  [ -5,   4,   15 ]  [-5, 4, 15]
1   2  13:25:09  [ -4,  9,  10 ]  [ 8, 6, 10 ]  [ -4,   9,   10 ]  [-4, 9, 10]
2   3  13:25:09  [ 0,  -4,  19 ]  [ 3, 1, 66 ]  [ 0,   -4,   19 ]  [0, -4, 19]

with correct type类型正确

print(type(df['mm_list_int'][0]))
<class 'list'>

print(type(df['mm_list_int'][0][0]))
<class 'int'>

that is a list of integers那是一个整数列表

what if I want the three mm values to be in different colums?如果我希望三个 mm 值位于不同的列中怎么办?

use

objs = [df, pd.DataFrame(df['mm_list_int'].tolist(), columns=['mm_x', 'mm_y', 'mm_z'])]
df_final = pd.concat(objs, axis=1)
df_final = df_final[['id', 'time', 'mm', 'dd', 'mm_x', 'mm_y', 'mm_z']]

obtaining获得

   id      time               mm            dd  mm_x  mm_y  mm_z
0   1  13:25:09  [ -5,  4,  15 ]  [ 4, 77, 8 ]    -5     4    15
1   2  13:25:09  [ -4,  9,  10 ]  [ 8, 6, 10 ]    -4     9    10
2   3  13:25:09  [ 0,  -4,  19 ]  [ 3, 1, 66 ]     0    -4    19

final touch最后一击

do the same with dd and you're donedd做同样的事情,你就完成了

df['dd_list'] = df['dd'].str.replace('[', '').str.replace(']', '').str.split(',').values.tolist()
df['dd_list_int'] = [[int(i) for i in x] for x in df['dd_list']]

objs = [df, 
        pd.DataFrame(df['mm_list_int'].tolist(), columns=['mm_x', 'mm_y', 'mm_z']),
        pd.DataFrame(df['dd_list_int'].tolist(), columns=['dd_x', 'dd_y', 'dd_z'])]
df_final = pd.concat(objs, axis=1)
df_final = df_final[['id', 'time', 'mm_x', 'mm_y', 'mm_z', 'dd_x', 'dd_y', 'dd_z']]

final result最后结果

   id      time  mm_x  mm_y  mm_z  dd_x  dd_y  dd_z
0   1  13:25:09    -5     4    15     4    77     8
1   2  13:25:09    -4     9    10     8     6    10
2   3  13:25:09     0    -4    19     3     1    66

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM