[英]How do I separate complicated measurement unit containing both characters, symbols and number from measurement value in Dataframe?
I am able to separate measurement units which are not complicated from measurement value if they appear together in DataFrame by using suggested answer provided in How do I separate measurement value and unit into their respective columns if they appear together in DataFrame?如果它们一起出现在 DataFrame 中,我可以将不复杂的测量单位与测量值分开,方法是使用如果测量值和单位一起出现在 DataFrame 中,我如何将它们分开到各自的列中?
However, I couldn't separate complicated measurement unit like area from measurement value as shown in below DataFrame.但是,我无法将面积等复杂的测量单位与测量值分开,如下图 DataFrame 所示。
measurments![]() |
value![]() |
unit![]() |
---|---|---|
mea1 ![]() |
12.8875cm2 ![]() |
|
mea2![]() |
33.1 mL/min/1.73m2 ![]() |
|
mea3 ![]() |
2.53mg / dL ![]() |
|
mea4 ![]() |
0.005ml/ min / m2 ![]() |
|
mea5![]() |
0.8ml/m2 ![]() |
|
mea6 ![]() |
0.73x10^3/UL ![]() |
May I know how could I separate complicated measurement unit from measurement value in Dataframe?请问Dataframe中如何将复杂的计量单位与计量值分开? The expected output is shown below:
预期的output如下图:
measurments![]() |
value![]() |
unit![]() |
---|---|---|
mea1 ![]() |
12.8875 ![]() |
cm2![]() |
mea2![]() |
33.1 ![]() |
mL/min/1.73m2![]() |
mea3 ![]() |
2.53 ![]() |
mg/dL![]() |
mea4 ![]() |
0.005 ![]() |
ml/min/m2![]() |
mea5![]() |
0.8 ![]() |
ml/m2![]() |
mea6 ![]() |
0.73 ![]() |
x10^3/UL ![]() |
Thanks.谢谢。
Use str.extract
:使用
str.extract
:
df[['value', 'unit']] = df['value'].str.extract(r'(\d+.?\d*)\s*(.*)')
output: output:
measurments value unit
0 mea1 12.8875 cm2
1 mea2 33.1 mL/min/1.73m2
2 mea3 2.53 mg / dL
3 mea4 0.005 ml/ min / m2
4 mea5 0.8 ml/m2
5 mea6 0.73 x10^3/UL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.