[英]Pandas make new columns from sub-string of another column
我正在嘗試從另一列的子字符串在熊貓中創建新列。
import pandas as pd
import re
df = {'title':['Apartment 2 roomns, 40 m²', 'House 7 rooms, 183 m²', 'House 4 rooms, 93 m²', 'Apartment 12 rooms, 275 m²']}
我正在嘗試使用正則表達式來捕獲組:
df['Name'] = df.title.str.extract(r'(^[a-zA-Z]+)', expand=True)
這個我得到了一個很好的結果。 但是我需要一個包含房間數量的列(沒有“房間”這個詞)和另一個沒有“m²”大小的列。 我試過了:
df['Rooms'] = df.title.str.replace(r'(^[0-9]+)\s(rooms)', r'\1') #to capture only the first group, which is the number
df['Size'] = df.title.str.replace(r'(^[0-9]+)\s(m²)', r'\1') #to capture only the first group, which is the number
我的輸出:
Name Rooms Size
0 Apartment Apartment 2 roomns, 40 m² Apartment 2 roomns, 40 m²
1 House House 7 rooms, 183 m² House 7 rooms, 183 m²
2 House House 4 rooms, 93 m² House 4 rooms, 93 m²
3 Apartment Apartment 12 rooms, 275 m² Apartment 12 rooms, 275 m²
良好的輸出:
Name Rooms Size
0 Apartment 2 40
1 House 7 183
2 House 4 93
3 Apartment 12 275
您可以使用
df["Rooms"] = df["title"].str.extract(r'(\d+)\s*room', expand=False)
df['Size'] = df["title"].str.extract(r'(\d+(?:\.\d+)?)\s*m²', expand=False)
輸出:
>>> df
title Rooms Size
0 Apartment 2 roomns, 40 m² 2 40
1 House 7 rooms, 183 m² 7 183
2 House 4 rooms, 93 m² 4 93
3 Apartment 12 rooms, 275 m² 12 275
(\d+)\s*room
正則表達式匹配並捕獲到第 1 組一個或多個數字,然后只匹配零個或多個空格( \s*
),然后是room
字符串。
(\d+(?:\.\d+)?)\s*m²
正則表達式匹配並捕獲一個或多個數字,以及 a 的可選字符串.
和一個或多個數字,然后匹配零個或多個空格,然后匹配m²
字符串。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.