將 dataframe 列類型更改為 int32

Question

我正在嘗試將 dataframe 列從 Python 傳遞給 Cython：

Python代碼

evaluate_c(
        AnimalLogicPy(data[COL_ANIMAL_ID].values,              
        data[COL_ANIMAL_POWER].values,
        )

賽通代碼

cpdef void evaluate_c(
        int[:] animal_ids,
        int[:] animal_power,
        ):

在 Python 端data[COL_ANIMAL_ID]和data[COL_ANIMAL_POWER]的類型為：int64

但是我收到以下錯誤：

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

我想在 Cython 中使用int值。 我做了一些閱讀，我認為這是因為有問題的 dataframe 列的類型是int64 ，我認為它正在變得很長，應該是int32 。

我曾嘗試在 Python 端使用以下方法更改類型：

data.astype({COL_ANIMAL_ID: 'int32'}).dtypes
data.astype({COL_ANIMAL_POWER: 'int32'}).dtypes

但我仍然得到 ValueError。

如何將 Python 端的列類型從 int64 更改為 int32？

Answer 1

您可以將其轉換為具有正確dtype的 NumPy 數組。

有多種方法可以實現這一點，其中最直接的是通過.to_numpy()方法：

data[COL_ANIMAL_ID].to_numpy('int32')

為了給你一個最小的工作示例，讓我們假設我們有以下 Cython function（為簡單起見，使用 IPython 的 Cython 魔法編譯）：

%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True


cpdef int summer(int [:] data, int n):
    cdef int result = 0
    for i in range(n):
        result += data[i]
    return result

然后下面的代碼工作：

import pandas as pd
import numpy as np


np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 100, (3, 4)))
print(df)
#     0   1   2   3
# 0  44  47  64  67
# 1  67   9  83  21
# 2  36  87  70  88


arr = np.array(df[0], dtype=np.int32)
print(summer(arr, arr.size))  # the array is fed to the Cython func
# 147

print(summer(df[0].values.astype(np.int32), df[0].size))  # directly from the Pandas's series
# 147

print(summer(df[0].to_numpy(dtype=np.int32), df[0].size))  # even more concisely
# 147

print(df[0].sum())  # checking that the result is correct
# 147

將 dataframe 列類型更改為 int32

問題描述

1 個解決方案

解決方案1
1 已采納 2020-07-02 13:38:44

將 dataframe 列類型更改為 int32

問題描述

1 個解決方案

解決方案1 1 已采納 2020-07-02 13:38:44

解決方案1
1 已采納 2020-07-02 13:38:44