如何将.strip().split() function 应用于 Pandas dataframe 中的整个列

Question

Example of Dataframe My Pandas dataframe has a column EvaRange which is captured in the following way. Dataframe 的示例我的Pandas dataframe 有一个 EvaRange 列，它是通过以下方式捕获的。

<1000 mm
1000-1200mm
1200-1400mm
>1400mm

Desired Output I want to perform some Machine Learning on the dataframe so I need to convert this into a single numerical value.所需的 Output我想对 dataframe 执行一些机器学习，因此我需要将其转换为单个数值。

So far I have managed to do this for a single row in the dataframe but I want to apply it to the entire column.到目前为止，我已经设法对 dataframe 中的一行执行此操作，但我想将其应用于整个列。

Code Example代码示例

a = df["EvaRange"][0].strip().split('mm')[0].split('-')
b = (float(a[0])+float(a[1]))/2
b

This manages to return an averaged value between the two ranges where 2 numbers are available.这设法返回两个可用数字的两个范围之间的平均值。

Request Please could someone assist me with generalizing this so that I can apply it to the entire column and accomodate for the "<" and ">" values.请求请有人帮助我概括这一点，以便我可以将其应用于整个列并适应“<”和“>”值。

Answer 1

I would recommend extracting numbers and then averaging them:我建议提取数字然后对其进行平均：

df["EvaRange"].str.extract(r"(\d+)\D*(\d+)?").astype(float).mean(axis=1)
#0    1000.0
#1    1100.0
#2    1300.0
#3    1400.0

Here, the regular expression r"(\d+)\D*(\d+)?"这里，正则表达式r"(\d+)\D*(\d+)?" asks for one or more digits (a number), optionally followed by some non-digits, optionally followed by some more digits (another number).要求一个或多个数字（一个数字），可选地后跟一些非数字，可选地后跟一些更多的数字（另一个数字）。

Answer 2

I would suggest using str.extractall to get all the numbers, then get the mean on the first level:我建议使用str.extractall获取所有数字，然后在第一级获取平均值：

df.EvaRange.str.extractall(r"(\d+)").astype(float).mean(level=0)

         0
0   1000.0
1   1100.0
2   1300.0
3   1400.0

Building on your idea of strip and split:基于您对剥离和拆分的想法：

(df.EvaRange
 .str.strip("<> mm")
 .str.split("-")
 .explode()
 .astype(float)
 .mean(level=0)
 )

0    1000.0
1    1100.0
2    1300.0
3    1400.0
Name: EvaRange, dtype: float64

如何将.strip().split() function 应用于 Pandas dataframe 中的整个列

问题描述

2 个解决方案

解决方案1
2 2020-12-03 07:51:26

解决方案2
0 已采纳 2020-12-03 08:18:07

如何将.strip().split() function 应用于 Pandas dataframe 中的整个列

问题描述

2 个解决方案

解决方案1 2 2020-12-03 07:51:26

解决方案2 0 已采纳 2020-12-03 08:18:07

解决方案1
2 2020-12-03 07:51:26

解决方案2
0 已采纳 2020-12-03 08:18:07