[英]How to mask columns with some nan values, using regular expressions in pandas?
我有一個數據框,其中有一個訪問區的列(在許多其他列中):
Index User Boroughs_visited
0 Eminem Manhattan, Bronx
1 BrSpears NaN
2 Elvis Brooklyn
3 Adele Queens, Brooklyn
我想創建第三列來顯示哪個用戶訪問了Brooklyn ,所以我用python編寫了最慢的代碼:
df['Brooklyn']= 0
def borough():
for index,x in enumerate(df['Boroughs_visited']):
if pd.isnull(x):
continue
elif re.search(r'\bBrooklyn\b',x):
df_vols['Brooklyn'][index]= 1
borough()
導致:
Index User Boroughs_visited Brooklyn
0 Eminem Manhattan, Bronx 0
1 BrSpears NaN 0
2 Elvis Brooklyn 1
3 Adele Queens, Brooklyn 1
我的計算機花了15秒才能運行2000行。 有更快的方法嗎?
讓.str
訪問器與contains
和fillna
:
df['Brooklyn'] = (df.Boroughs_visited.str.contains('Brooklyn') * 1).fillna(0)
或同一語句的另一種格式:
df['Brooklyn'] = df.Boroughs_visited.str.contains('Brooklyn').mul(1, fill_value=0)
輸出:
Index User Boroughs_visited Brooklyn
0 0 Eminem Manhattan, Bronx 0
1 1 BrSpears NaN None 0
2 2 Elvis Brooklyn 1
3 3 Adele Queens, Brooklyn 1
您可以以一個的價格獲得所有自治市鎮
df.join(df.Boroughs_visited.str.get_dummies(sep=', '))
Index User Boroughs_visited Bronx Brooklyn Manhattan Queens
0 0 Eminem Manhattan, Bronx 1 0 1 0
1 1 BrSpears NaN 0 0 0 0
2 2 Elvis Brooklyn 0 1 0 0
3 3 Adele Queens, Brooklyn 0 1 0 1
但是如果你真的真的想要布魯克林
df.join(df.Boroughs_visited.str.get_dummies(sep=', ').Brooklyn)
Index User Boroughs_visited Brooklyn
0 0 Eminem Manhattan, Bronx 0
1 1 BrSpears NaN 0
2 2 Elvis Brooklyn 1
3 3 Adele Queens, Brooklyn 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.