简体   繁体   中英

Removing Strings from a Pandas DataFrame Column

I have a pandas dataframe as shown below.

DF1 =

sid                 path
 1    '["rome","is","in","province","lazio"]'   
 1    "['rome', 'is', 'in', 'province', 'naples']"
 1     ['N']
 1    "['rome', 'is', 'in', 'province', 'in', 'campania']"
 ....

I want to remove all unnecessary characters of the column path so the result should look like this:

DF2 =

    sid                  path
     1         rome is in province lazio
     1         rome is in province naples
     1                    N
     1         rome is in province in campania
 ....

I tried replacing all the unnecessary characters like this :

 DF1["path"].replace("[","").replace("]","").replace('"',"").replace(","," ").replace("'","")

But it didn't work. I suppose it's due to the entries ["N"]

How can I do this? Any help is appreciated!

You can use ast.literal_eval to safely read lists output as strings. One way to account for genuine lists is to catch ValueError .

Note that, if at all possible, you should try to sort these issues upstream before they reach your dataframe.

from ast import literal_eval

df = pd.DataFrame({'sid': [1, 1, 1, 1],
                   'path': ['["rome","is","in","province","lazio"]',
                            "['rome', 'is', 'in', 'province', 'naples']",
                            ['N'],
                            "['rome', 'is', 'in', 'province', 'in', 'campania']"]})

def converter(x):
    try:
        return ' '.join(literal_eval(x))
    except ValueError:
        return ' '.join(x)

df['path'] = df['path'].apply(converter)

print(df)

                              path  sid
0        rome is in province lazio    1
1       rome is in province naples    1
2                                N    1
3  rome is in province in campania    1

Using ast.literal_eval & str.join

Demo:

import pandas as pd
import ast
df = pd.DataFrame({"path": ['["rome","is","in","province","lazio"]', "['rome', 'is', 'in', 'province', 'naples']", ['N']]})
df['path'] = df['path'].astype(str).apply(ast.literal_eval).apply(lambda x: " ".join(x))
print(df)

Output:

                         path
0   rome is in province lazio
1  rome is in province naples
2                           N

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM