简体   繁体   中英

convert pandas data frame column values into comma separated strings

i have a data frame that looks like this:

    Column1                      Column2

'['jjhjh', 'adads','adsd']',    'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'
'['jjhjh', 'adads','adsd']',    'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'
'['adads','adsd']',             'dwdwdqw'

Although the items in Column1 look like lists of items, they are strings. I want to remove the square brackets from this string, remove the quotes and replace all values in that column with the same values but as comma separated strings. My desired output would be:

   Column1                   Column2

'jjhjh', 'adads','adsd',    'dwdwdqw'
'adads','adsd',             'dwdwdqw'
'jjhjh', 'adads','adsd',    'dwdwdqw'
'adads','adsd',             'dwdwdqw'
'adads','adsd',             'dwdwdqw'

I tried the following function but it does not replace the elements:

def string_convert(column_name):
  lista=[]
  for i in column_name:
    i=i.strip("[]")
    i=eval(i)
    lista.append(i)
  for m in lista:
    if m == tuple:
      column_name = m[0] + ',' + m[1]
    else:
      column_name = m
  return df['other']

Can anyone help me with this? Thanks in advance.

According to the new final needs included by the question creator in her own answer , like a generating a final list with all values to column1 , I recommend doing it as follows:

import pandas as pd

df = pd.DataFrame(columns=['column1','column2'])
df['column1'] = ["'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['jjhjh', 'adads','adsd']'","'['adads','adsd']'","'['adads','adsd']'"]
df['column2'] = ['dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw','dwdwdqw']

df['column1'] = df['column1'].replace('\[|\'|\"|\]| ', '', regex=True)

df['column1'] = df['column1'].str.split(',')

final_list = [x for xs in df['column1'].tolist() for x in xs]

print(final_list)

Output:

['jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'jjhjh', 'adads', 'adsd', 'adads', 'adsd', 'adads', 'adsd']

But I confess that there is a risk in this method because if the string you want has a space between the words, they will be removed, so in your case I still think you should be careful with the use.

Greetings from Brazil!

This loop worked for me.

for i, row in df.iterrows():
    tmp_val = row['Column1'].replace("'", "").replace("[", "").replace("]", "").split(',')
    row['Column1'] = tmp_val

This loops through each row of the DataFrame and modifies the cell in Column1 by doing a string replace to remove the characters you don't want(square brackets and single quotes). Then.split(',') creates the list and the last line of code replaces the original value with our newly created list.

By leveraging str.replace() :

df.assign(Column1=df['Column1'].str.replace(r"'\[(.*)\]',", r"\1,", regex=True))

                    Column1                 Column2
0  'jjhjh', 'adads','adsd',               'dwdwdqw'
1           'adads','adsd',               'dwdwdqw'
2  'jjhjh', 'adads','adsd',               'dwdwdqw'
3           'adads','adsd',               'dwdwdqw'
4           'adads','adsd',               'dwdwdqw'

The best solution for my cases was the code below because I had in each string quotes, double quotes and squared brackets. With this code I could remove everything and after that the For loop I removed the unnecessary spaces and split each row into a list:

df['column1']=df['column1'].replace('\[|\'|\"|\]', '', regex=True)

df['column1']=df['column1'].str.split(',')

lista=[]
for s in df['column1']:
  for a in s:
    lista.append(a)

lista1=[b.strip() for b in lista]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM