簡體   English   中英

dataframe 單元上的 .isspace()

[英].isspace() on a dataframe cell

我需要替換 dataframe 中僅包含空格的值。 我嘗試使用以下代碼,但它替換了列中的所有值:

books['original_title'] = books.apply(lambda row: row['title']
                                    if (str(row['original_title']).isspace() == True)
                                    else row['title'],
                                    axis=1)

例如,對於這個 df:

books = pd.DataFrame({'title': ['If You Take a Mouse to School', 'Sea of Swords', 'SHOULD NOT CHANGE'], 
              'original_title': ['   ', ' ', 'NOT CHANGING']})

預期的答案對應於以下 dataframe:

expected_answer = pd.DataFrame({'title': ['If You Take a Mouse to School', 'Sea of Swords', 'SHOULD NOT CHANGE'], 
              'original_title': ['If You Take a Mouse to School', 'Sea of Swords', 'NOT CHANGING']})

但我只得到這個:

answer = pd.DataFrame({'title': ['If You Take a Mouse to School', 'Sea of Swords', 'SHOULD NOT CHANGE'], 
              'original_title': ['If You Take a Mouse to School', 'Sea of Swords', 'SHOULD NOT CHANGE']})

如果有人可以幫助我,我將不勝感激。

首先使用Series.replaceempty spaces替換為NaN值,然后使用Series.fillnatitle列的值中填充original_title列中的缺失值:

books['original_title'] = (
    books['original_title'].replace(
        r'^\s*$', np.nan, regex=True).fillna(books['title'])
)

結果:

print(books)
                           title                 original_title
0  If You Take a Mouse to School  If You Take a Mouse to School
1                  Sea of Swords                  Sea of Swords
2              SHOULD NOT CHANGE                   NOT CHANGING

使用Series.where替換為掩碼 - 通過Series.str.strip刪除多個空格,然后轉換為 bool 以將這些值轉換為False s:

mask = books['original_title'].str.strip().astype(bool)
books['original_title'] = books['original_title'].where(mask, books['title'])

print (books)
                           title                 original_title
0  If You Take a Mouse to School  If You Take a Mouse to School
1                  Sea of Swords                  Sea of Swords
2              SHOULD NOT CHANGE                   NOT CHANGING

詳情

print (mask)
0    False
1    False
2     True
Name: original_title, dtype: bool

類似的想法是由Series.str.contains測試,通過正則表達式測試零或多個空格,並通過True s 通過Series.mask設置值:

mask1 = books['original_title'].str.contains('^\s*$')
books['original_title'] = books['original_title'].mask(mask1, books['title'])

詳情

print (mask1)
0     True
1     True
2    False
Name: original_title, dtype: bool

Series.str.isspace可以使用,但如果空間為空(數據已更改)則無法使用:

books = pd.DataFrame({'title': ['If You Take a Mouse to School', 'Sea of Swords', 'SHOULD NOT CHANGE'], 
              'original_title': ['   ', '', 'NOT CHANGING']})

mask = books['original_title'].str.isspace()
books['original_title'] = books['original_title'].mask(mask, books['title'])

print (books)
                           title                 original_title
0  If You Take a Mouse to School  If You Take a Mouse to School
1                  Sea of Swords                               
2              SHOULD NOT CHANGE                   NOT CHANGING

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM