简体   繁体   中英

how to remove whitespace from string in pandas column

I need to remove whitespaces in pandas df column. My data looks like this:

industry            magazine
Home                "Goodhousekeeping.com"; "Prevention.com";
Fashion             "Cosmopolitan"; " Elle"; "Vogue"
Fashion             " Vogue"; "Elle"

Below is my code:

# split magazine column values, create a new column in df 
df['magazine_list'] = dfl['magazine'].str.split(';')

# stip the first whitespace from strings
df.magazine_list = df.magazine_list.str.lstrip()

This returns all NaN, I have also tried:

df.magazine = df.magazine.str.lstrip()

This didn't remove the white spaces either.

Use list comprehension with strip of splitted values, also strip values before split for remove trailing ; , spaces and " values:

f = lambda x: [y.strip('" ') for y in x.strip(';" ').split(';')]
df['magazine_list'] = df['magazine'].apply(f)
print (df)
  industry                                 magazine  \
0     Home  Goodhousekeeping.com; "Prevention.com";   
1  Fashion           Cosmopolitan; " Elle"; "Vogue"   
2  Fashion                             Vogue; "Elle   

                            magazine_list  
0  [Goodhousekeeping.com, Prevention.com]  
1             [Cosmopolitan, Elle, Vogue]  
2                           [Vogue, Elle]  

Jezrael provides a good solution. It is useful to know that pandas has string accessors for similar operations without the need of list comprehensions. Normally a list comprehension is faster, but depending on the use case using pandas built-in functions could be more readable or simpler to code.

df['magazine'] = (
    df['magazine']
    .str.replace(' ', '', regex=False)
    .str.replace('"', '', regex=False)
    .str.strip(';')
    .str.split(';')
)

Output

  industry                                magazine
0     Home  [Goodhousekeeping.com, Prevention.com]
1  Fashion             [Cosmopolitan, Elle, Vogue]
2  Fashion                           [Vogue, Elle]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM