简体   繁体   中英

Removing square brackets from panda dataframe

I have a dataframe that I need to remove the square brackets from in order to input into a loop.

I've tried using str.strip and str.commands. However I get errors saying I must pass a dataframe with boolean values. It seems the values in my data frame are lists.

m2 = pd.read_csv('newm2.csv', sep=',s', header=None)
print m2

Sample input:

m2 = pd.DataFrame([
    [[14,38,51,65,84,85]],
    [[3,34,58,65,66,75]],
    [[3,15,68,70,80,82]],
    [[19,31,42,50,54,97]],
    [[4,9,48,62,74,77]],
])
                         0
0      [14,38,51,65,84,85]
1       [3,34,58,65,66,75]
2       [3,15,68,70,80,82]
3      [19,31,42,50,54,97]
4        [4,9,48,62,74,77]

Above is a small example of what it's currently printing. I need each row to look like:

"14,38,51,65,84,85"

How do I solve the problem?

One simple way is to transform the list into a str :

x = [
    [[14,38,51,65,84,85]],
    [[3,34,58,65,66,75]],
    [[3,15,68,70,80,82]],
    [[19,31,42,50,54,97]],
    [[4,9,48,62,74,77]],
]

m2 = pd.DataFrame(x)
m2[0] = m2[0].apply(lambda x: ','.join([str(i) for i in x]))

m2
Out[1]:
        0
0      '14,38,51,65,84,85'
1      '3,34,58,65,66,75'
2      '3,15,68,70,80,82'
3      '19,31,42,50,54,97'
4       '4,9,48,62,74,77'

Edit

What if the rows are of type str and not actually list . We just convert them to be understood as literal str and then perform the .join

from ast import literal_eval

x = [
    ['[14,38,51,65,84,85]'],
    ['[3,34,58,65,66,75]'],
    ['[3,15,68,70,80,82]'],
    ['[19,31,42,50,54,97]'],
    ['[4,9,48,62,74,77]'],
]

m2 = pd.DataFrame(x)

m2[0] = m2[0].apply(lambda x: ','.join([str(i) for i in literal_eval(x)]))
m2
Out[1]:
        0
0      '14,38,51,65,84,85'
1      '3,34,58,65,66,75'
2      '3,15,68,70,80,82'
3      '19,31,42,50,54,97'
4       '4,9,48,62,74,77'

I would avoid apply due to its inferior performance. Here's another way.

m2 = pd.DataFrame([
    [[14,38,51,65,84,85]],
    [[3,34,58,65,66,75]],
    [[3,15,68,70,80,82]],
    [[19,31,42,50,54,97]],
    [[4,9,48,62,74,77]],
])
m2.iloc[:, 0] = m2.iloc[:, 0].astype(str)
m2.iloc[:, 0] = ['"' + x.strip('[').strip(']').replace(' ','') + '"' for x in m2.iloc[:, 0]]
m2

The output:

                0
0   "14,38,51,65,84,85"
1   "3,34,58,65,66,75"
2   "3,15,68,70,80,82"
3   "19,31,42,50,54,97"
4   "4,9,48,62,74,77"

As

a = [1,2,3,4]

you can

b = str(a) # out: '[1,2,3,4]'

to avoid '[' and ']'

b = str(a)[1:-1] # out: '1,2,3,4'

so, we just have to apply this to every array in the array m2[0]

import pandas as pd

m2 = pd.DataFrame([
    [[14,38,51,65,84,85]],
    [[3,34,58,65,66,75]],
    [[3,15,68,70,80,82]],
    [[19,31,42,50,54,97]],
    [[4,9,48,62,74,77]],
])

m2[0] = m2[0].apply(lambda x: str(x)[1:-1])

print(m2[0])

output:

0    14, 38, 51, 65, 84, 85
1     3, 34, 58, 65, 66, 75
2     3, 15, 68, 70, 80, 82
3    19, 31, 42, 50, 54, 97
4      4, 9, 48, 62, 74, 77

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM