简体   繁体   English

Python pandas 数据帧缩短了从十六进制字符串到整数的转换时间

[英]Python pandas dataframe shorten the conversion time from hex string to int

My intention is to convert the whole dataframe from hex string to int.我的目的是将整个数据帧从十六进制字符串转换为 int。 Currently I able to do it based on the answer provided at pandas dataframe.apply -- converting hex string to int number目前,我可以根据pandas dataframe.apply提供的答案来做到这一点——将十六进制字符串转换为整数

df = df.apply(lambda x: x.astype(str).map(lambda x: int(x, base=16))) df = df.apply(lambda x: x.astype(str).map(lambda x: int(x, base=16)))

However, it runs very slow especially when the dataframe is big.但是,它运行速度非常慢,尤其是当数据帧很大时。 I saw an answer from https://stackoverflow.com/a/52855646/5057185 saying that the lambda isn't necessary and adds overhead.我从https://stackoverflow.com/a/52855646/5057185看到了一个答案,说 lambda 不是必需的,并且会增加开销。 I tried to implement it but I got this error.我试图实现它,但我收到了这个错误。

df2 = pd.read_csv(path+temp_file, dtype=str)
df2 = df2.dropna()
df2 = df2.apply(int,base=16)

df2 = df2.apply(int,base=16) Traceback (most recent call last): File "", line 1, in File "C:\\Python27\\lib\\site-packages\\pandas\\core\\frame.py", line 6487, in apply return op.get_result() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 151, in get_result return self.apply_standard() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 257, in apply_standard self.apply_series_generator() File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 286, in apply_series_generator results[i] = self.f(v) File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 78, in f return func(x, *args, **kwds) TypeError: ("int() can't convert non-string with explicit base", u'occurred at index POWERON') df2 = df2.apply(int,base=16) 回溯(最近一次调用最后一次):文件“”,第 1 行,在文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\frame.py”中,第 6487 行,在应用中返回 op.get_result() 文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py”,第 151 行,在 get_result 中返回 self.apply_standard() 文件“C:\\Python27 \\lib\\site-packages\\pandas\\core\\apply.py”,第 257 行,在 apply_standard self.apply_series_generator() 文件“C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py”,第 286 行, 在 apply_series_generator results[i] = self.f(v) File "C:\\Python27\\lib\\site-packages\\pandas\\core\\apply.py", line 78, in f return func(x, *args, * *kwds) 类型错误:(“int() 不能用显式基数转换非字符串”,你'发生在索引 POWERON')

I believe this error is due to the dtype of the dataframe is object instead of string and this problem is known and solved in the newer version of pandas, pd.read_csv(path+temp_file, dtype="string").我相信这个错误是由于数据帧的 dtype 是对象而不是字符串,这个问题在新版本的 Pandas pd.read_csv(path+temp_file, dtype="string") 中是已知的并解决了。 I am using the old version of pandas.我正在使用旧版本的熊猫。 How can I workaround this or any other method to convert dataframe faster?我怎样才能解决这个问题或任何其他方法来更快地转换数据帧?

I think you need DataFrame.applymap for elementwise processing:我认为您需要DataFrame.applymap进行元素处理:

df2 = df2.applymap(lambda x: int(x,base=16))

Another idea is reshape by DataFrame.stack and Series.unstack :另一个想法是通过DataFrame.stackSeries.unstack重塑:

df2 = df2.stack().apply(lambda x: int(x, 16)).unstack()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM