将 pandas 系列列表转换为 numpy 数组

Question

我想将 pandas 系列的数字列表字符串转换为 numpy 数组。 我所拥有的是这样的：

ds = pd.Series(['[1 -2 0 1.2 4.34]', '[3.3 4 0 -1 9.1]'])

我想要的 output：

arr = np.array([[1, -2, 0, 1.2, 4.34], [3.3, 4, 0, -1, 9.1]])

到目前为止，我所做的是将 pandas 系列转换为一系列数字列表：

ds1 = ds.apply(lambda x: [float(number) for number in x.strip('[]').split(' ')])

但我不知道如何将 go 从ds1到arr 。

Answer 1

使用Series.str.strip + Series.str.split并使用 dtype dtype=float创建一个新的np.array ：

arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')

结果：

print(arr)

array([[ 1.  , -2.  ,  0.  ,  1.2 ,  4.34],
       [ 3.3 ,  4.  ,  0.  , -1.  ,  9.1 ]])

Answer 2

您可以先尝试从 object 系列中删除“[]”，然后事情会变得容易， https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split。 html 。

ds1 = ds.str.strip("[]")
# split and exapand the data, conver to numpy array
arr = ds1.str.split(" ", expand=True).to_numpy(dtype=float)

然后arr将是您想要的正确格式，

array([[ 1.  , -2.  ,  0.  ,  1.2 ,  4.34],
       [ 3.3 ,  4.  ,  0.  , -1.  ,  9.1 ]])

然后我做了一个与 Shubham 的 colution 比较的小分析。

# Shubham's way
%timeit arr = np.array(ds.str.strip('[]').str.split().tolist(), dtype='float')
332 µs ± 5.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# my way
%timeit ds.str.strip("[]").str.split(" ", expand=True).to_numpy(dtype=float)
741 µs ± 4.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

显然，他的解法要快得多！ 干杯!

将 pandas 系列列表转换为 numpy 数组

问题描述

2 个解决方案

解决方案1
5 已采纳 2020-08-20 12:52:36

解决方案2
1 2020-08-20 16:48:01

将 pandas 系列列表转换为 numpy 数组

问题描述

2 个解决方案

解决方案1 5 已采纳 2020-08-20 12:52:36

解决方案2 1 2020-08-20 16:48:01

解决方案1
5 已采纳 2020-08-20 12:52:36

解决方案2
1 2020-08-20 16:48:01