[英]Python Pandas Extract word from column that contains String with Regex
[英]regex to extract data from a string in pandas column
幫助我編寫一個正則表達式來處理 rawDim 中的字符串以提取高度、寬度和深度(作為 float64 整數)。
獎勵:所有 5 個示例都有一個正則表達式嗎?
import pandas as pd
dim_df = pd.read_csv("dim_df_correct.csv")
dim_df
rawDim height width depth
0 19×52cm 19.0 52.0 NaN
1 50 x 66,4 cm 50.0 66.4 NaN
2 168.9 x 274.3 x 3.8 cm (66 1/2 x 108 x 1 1/2 in.) 168.9 274.3 3.8
3 Sheet: 16 1/4 × 12 1/4 in. (41.3 × 31.1 cm) Im... 35.6 25.1 NaN
4 5 by 5in 12.7 12.7 NaN
import re
import pandas as pd
您可以使用'(?P<height>[\d.]+)\s*(?:[x×]|by)\s*(?P<width>[\d.]+)\s*(?:[x×]\s*(?P<depth>[\d.]+))?'
:
df[['rawDim']].join(
df['rawDim'].str.replace(r'(\d+),', r'\1.', regex=True)
.str.extract(r'(?P<height>[\d.]+)\s*(?:[x×]|by)\s*(?P<width>[\d.]+)\s*(?:[x×]\s*(?P<depth>[\d.]+))?')
.astype(float)
)
output:
rawDim height width depth
0 19×52cm 19.0 52.0 NaN
1 50 x 66,4 cm 50.0 66.4 NaN
2 168.9 x 274.3 x 3.8 cm (66 1/2 x 108 x 1 1/2 in.) 168.9 274.3 3.8
3 Sheet: 16 1/4 × 12 1/4 in. (41.3 × 31.1 cm) Im... 4.0 12.0 NaN
4 5 by 5in 5.0 5.0 NaN
注意。 添加\s*cm\b
以確保僅 cm
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.