[英]Optimising quartiling of columns of panda dataframe?
I have multiple columns in a data frame that have numerical data.我在具有数字数据的数据框中有多列。 I want to quartile each column, changing each value to either q1, q2, q3 or q4.
我想对每一列进行四分位数,将每个值更改为 q1、q2、q3 或 q4。
I currently loop through each column and change them using the pandas qcut function:我目前遍历每一列并使用 pandas qcut 函数更改它们:
for column_name in df.columns:
df[column_name] = pd.qcut(df[column_name].astype('float'), 4, ['q1','q2','q3','q4'])
This is very slow!这很慢! Is there a faster way to do this?
有没有更快的方法来做到这一点?
Played around with the the following example a little.稍微玩了一下下面的例子。 Looks like converting to float from a string is increasing the time.
看起来从字符串转换为浮点数会增加时间。 Though a working example was not provided, so the original type can't be known.
虽然没有提供工作示例,因此无法知道原始类型。
df[column].astype(copy=)
appears to be performant if copying or not. df[column].astype(copy=)
无论是否复制,似乎都具有性能。 Not much else to go after.没有什么可追求的。
import pandas as pd
import numpy as np
import random
import time
random.seed(2)
indexes = [i for i in range(1,10000) for _ in range(10)]
df = pd.DataFrame({'A': indexes, 'B': [str(random.randint(1,99)) for e in indexes], 'C':[str(random.randint(1,99)) for e in indexes], 'D':[str(random.randint(1,99)) for e in indexes]})
#df = pd.DataFrame({'A': indexes, 'B': [random.randint(1,99) for e in indexes], 'C':[random.randint(1,99) for e in indexes], 'D':[random.randint(1,99) for e in indexes]})
df_result = pd.DataFrame({'A': indexes, 'B': [random.randint(1,99) for e in indexes], 'C':[random.randint(1,99) for e in indexes], 'D':[random.randint(1,99) for e in indexes]})
def qcut(copy, x):
for i, column_name in enumerate(df.columns):
s = pd.qcut(df[column_name].astype('float', copy=copy), 4, ['q1','q2','q3','q4'])
df_result["col %d %d"%(x, i)] = s.values
times = []
for x in range(0,10):
a = time.clock()
qcut(True, x)
b = time.clock()
times.append(b-a)
print np.mean(times)
for x in range(10, 20):
a = time.clock()
qcut(False, x)
b = time.clock()
times.append(b-a)
print np.mean(times)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.