Python pandas 数据帧缩短了从十六进制字符串到整数的转换时间

Question

My intention is to convert the whole dataframe from hex string to int.我的目的是将整个数据帧从十六进制字符串转换为 int。 Currently I able to do it based on the answer provided at pandas dataframe.apply -- converting hex string to int number目前，我可以根据pandas dataframe.apply提供的答案来做到这一点——将十六进制字符串转换为整数

df = df.apply(lambda x: x.astype(str).map(lambda x: int(x, base=16))) df = df.apply(lambda x: x.astype(str).map(lambda x: int(x, base=16)))

However, it runs very slow especially when the dataframe is big.但是，它运行速度非常慢，尤其是当数据帧很大时。 I saw an answer from https://stackoverflow.com/a/52855646/5057185 saying that the lambda isn't necessary and adds overhead.我从https://stackoverflow.com/a/52855646/5057185看到了一个答案，说 lambda 不是必需的，并且会增加开销。 I tried to implement it but I got this error.我试图实现它，但我收到了这个错误。

df2 = pd.read_csv(path+temp_file, dtype=str)
df2 = df2.dropna()
df2 = df2.apply(int,base=16)

df2 = df2.apply(int,base=16) Traceback (most recent call last): File "", line 1, in File "C:\\Python27\\lib\\site-packages\\pandas\\core\\frame.py", line 6487, in apply return op.get_result() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 151, in get_result return self.apply_standard() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 257, in apply_standard self.apply_series_generator() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 286, in apply_series_generator results[i] = self.f(v) File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 78, in f return func(x, *args, **kwds) TypeError: ("int() can't convert non-string with explicit base", u'occurred at index POWERON') df2 = df2.apply(int,base=16) 回溯（最近一次调用最后一次）：文件“”，第 1 行，在文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\frame.py”中，第 6487 行，在应用中返回 op.get_result() 文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py”，第 151 行，在 get_result 中返回 self.apply_standard() 文件“C:\\Python27 \\lib\\site-packages\\pandas\\core\\apply.py”，第 257 行，在 apply_standard self.apply_series_generator() 文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py”，第 286 行, 在 apply_series_generator results[i] = self.f(v) File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 78, in f return func(x, *args, * *kwds) 类型错误：（“int() 不能用显式基数转换非字符串”，你'发生在索引 POWERON'）

I believe this error is due to the dtype of the dataframe is object instead of string and this problem is known and solved in the newer version of pandas, pd.read_csv(path+temp_file, dtype="string").我相信这个错误是由于数据帧的 dtype 是对象而不是字符串，这个问题在新版本的 Pandas pd.read_csv(path+temp_file, dtype="string") 中是已知的并解决了。 I am using the old version of pandas.我正在使用旧版本的熊猫。 How can I workaround this or any other method to convert dataframe faster?我怎样才能解决这个问题或任何其他方法来更快地转换数据帧？

Answer 1

I think you need DataFrame.applymap for elementwise processing:我认为您需要DataFrame.applymap进行元素处理：

df2 = df2.applymap(lambda x: int(x,base=16))

Another idea is reshape by DataFrame.stack and Series.unstack :另一个想法是通过DataFrame.stack和Series.unstack重塑：

df2 = df2.stack().apply(lambda x: int(x, 16)).unstack()

Python pandas 数据帧缩短了从十六进制字符串到整数的转换时间

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-09 08:56:46

Python pandas 数据帧缩短了从十六进制字符串到整数的转换时间

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-09 08:56:46

解决方案1
1 已采纳 2020-09-09 08:56:46