如何從數據框列的字符串中刪除空格？

Question

我正在嘗試遍歷 pandas 數據框中的列，以刪除列中字符串開頭和結尾的不必要空格。 我的數據框如下所示：

df={'c1': [' ab', 'fg', 'ac ', 'hj-jk ', ' ac', 'df, gh', 'gh', 'ab', 'ad', 'jk-pl', 'ae', 'kl-kl '], 'b2': ['ba', 'bc', 'bd', 'be', 'be', 'be', 'ba'] }


    c1  b2
0   ab, fg
1   ac, hj-jk   
2   ac, df,gh   
3   gh, be
4   ab, be
5   ad, jk-pl
6   ae, kl-kl

我在這里嘗試了這個答案，但也沒有用。 我需要從該列中的字符串中刪除空格的原因是我想使用 get.dummies() 函數對該列進行一次熱編碼。 我的想法是使用 strip() 函數從每個值中刪除空格，然后我使用 .str.get_dummies(',')：

#function to remove white space from strings
def strip_string(dataframe, column_name):
  for id, item in dataframe[column_name].items():
    a=item.strip()

#removing the white space from the values of the column
strip_string(df, 'c1')

#creating one hot-encoded columns from the values using split(",")

df1=df['c1'].str.get_dummies(',')

但我的代碼返回重復的列，我不希望這樣......我想刪除空格的功能不能正常工作？ 任何人都可以幫忙嗎？ 我目前的輸出是：

   ab   ac  df  fg  gh  hj-jk   jk-pl   kl-kl   ab  ac  ad  ae  gh
0   1   0   0   1   0   0   0   0   0   0   0   0   0
1   0   0   0   0   0   1   0   0   0   1   0   0   0
2   0   1   1   0   1   0   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   0   0   0   0   0   1
4   0   0   0   0   0   0   0   0   1   0   0   0   0
5   0   0   0   0   0   0   1   0   0   0   1   0   0
6   0   0   0   0   0   0   0   1   0   0   0   1   0

列 'ac' 和 'ab' 重復。 我想刪除重復的列

Answer 1

更新：

我認為您需要處理逗號周圍的空格以及字符串的開頭/結尾，以便Series.str.get_dummies()為您的示例正常工作：

df = df.apply(lambda x: x.str.strip().str.replace(' *, *', ','))

輸入：

        c1   b2
0       ab  foo
1       fg  foo
2      ac   foo
3   hj-jk   foo
4       ac  foo
5   df, gh  foo
6       gh  foo
7       ab  foo
8       ad  foo
9    jk-pl  foo
10      ae  foo
11  kl-kl   foo

中間數據框（在刪除開頭和結尾以及與逗號相鄰的空格之后）：

       c1   b2
0      ab  foo
1      fg  foo
2      ac  foo
3   hj-jk  foo
4      ac  foo
5   df,gh  foo
6      gh  foo
7      ab  foo
8      ad  foo
9   jk-pl  foo
10     ae  foo
11  kl-kl  foo

輸出：

    ab  ac  ad  ae  df  fg  gh  hj-jk  jk-pl  kl-kl
0    1   0   0   0   0   0   0      0      0      0
1    0   0   0   0   0   1   0      0      0      0
2    0   1   0   0   0   0   0      0      0      0
3    0   0   0   0   0   0   0      1      0      0
4    0   1   0   0   0   0   0      0      0      0
5    0   0   0   0   1   0   1      0      0      0
6    0   0   0   0   0   0   1      0      0      0
7    1   0   0   0   0   0   0      0      0      0
8    0   0   1   0   0   0   0      0      0      0
9    0   0   0   0   0   0   0      0      1      0
10   0   0   0   1   0   0   0      0      0      0
11   0   0   0   0   0   0   0      0      0      1

如果您只使用strip() （如下面我之前的回答），您將得到類似這樣的內容，其中包含gh的副本：

     gh  ab  ac  ad  ae  df  fg  gh  hj-jk  jk-pl  kl-kl
0     0   1   0   0   0   0   0   0      0      0      0
1     0   0   0   0   0   0   1   0      0      0      0
2     0   0   1   0   0   0   0   0      0      0      0
3     0   0   0   0   0   0   0   0      1      0      0
4     0   0   1   0   0   0   0   0      0      0      0
5     1   0   0   0   0   1   0   0      0      0      0
6     0   0   0   0   0   0   0   1      0      0      0
7     0   1   0   0   0   0   0   0      0      0      0
8     0   0   0   1   0   0   0   0      0      0      0
9     0   0   0   0   0   0   0   0      0      1      0
10    0   0   0   0   1   0   0   0      0      0      0
11    0   0   0   0   0   0   0   0      0      0      1

較早的答案：

以下任何一項都應該有效：

df = df.applymap(lambda x: x.strip())

... 或者：

df = df.apply(lambda x: x.str.strip())

Answer 2

我會stack 、 strip 、 get_dummies和groupby.max ：

如果分隔符是', ' ：

df.stack().str.strip().str.get_dummies(sep=', ').groupby(level=0).max()

別的：

df.stack().str.replace(r'\s', '', regex=True).str.get_dummies(sep=',').groupby(level=0).max()

輸出：

   ab  ac  ba  bc  bd  be  df  fg  gh  hj-jk
0   1   0   1   0   0   0   0   0   0      0
1   0   0   0   1   0   0   0   1   0      0
2   0   1   0   0   1   0   0   0   0      0
3   0   0   0   0   0   1   0   0   0      1
4   0   1   0   0   0   1   0   0   0      0
5   0   0   0   0   0   1   1   0   1      0
6   0   0   1   0   0   0   0   0   1      0

Answer 3

看看這是否有幫助：

import numpy as np
import pandas as pd
data={'c1': [' ab ', 'fg', 'ac ', 'hj-jk '], 'b2': ['ba', 'bc', 'bd', 'be'] }
df=pd.DataFrame(data)
print(df.head())
df=df.apply(lambda x: x.map(str.strip))
print(df.head())

如何從數據框列的字符串中刪除空格？

問題描述

3 個解決方案

解決方案1
1 2022-06-18 18:46:04

解決方案2
1 已采納 2022-06-18 18:56:55

解決方案3
0 2022-06-18 18:42:08

如何從數據框列的字符串中刪除空格？

問題描述

3 個解決方案

解決方案1 1 2022-06-18 18:46:04

解決方案2 1 已采納 2022-06-18 18:56:55

解決方案3 0 2022-06-18 18:42:08

解決方案1
1 2022-06-18 18:46:04

解決方案2
1 已采納 2022-06-18 18:56:55

解決方案3
0 2022-06-18 18:42:08