The objective is to delete rows based on multiple columns.
Say, if the array is of size Nx3, then drop any rows that not having value Column0>=Column1>=Column2
. Whereas, for array of size NX6, then drop any rows that not having value Column0>=Column1>=Column2
and Column3>=Column4>=Column5
. The same rule apply for array of size NxM, where M is the increment of 3.
The following code should achieve the above requirement
arr = np.meshgrid ( *[[1, 2, 3,10] for _ in range ( 12 )] )
df = pd.DataFrame ( list ( map ( np.ravel, arr ) ) ).transpose ()
df_len = len ( df.columns )
a_list = np.arange ( df_len ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
mask = (df [a_list [x, 0]] >= df [a_list [x, 1]]) & (df [a_list [x, 1]] >= df [a_list [x, 2]])
df.drop ( df.loc [~mask].index, inplace=True )
However, the above code above is not time friendly with higher dimension and longer list_no length.
May I know how to improved the above code.
Working directly with numpy array significantly reduce the overall computation.
dimension=9
list_no=[1, 2,3,10]
arr = np.meshgrid ( *[list_no for _ in range ( dimension )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape
a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]
Here is my proposal for the NxM problem:
n=10000
m=9
df = pd.DataFrame(np.random.randint(0,n,size=(n, m)))
def condition(col):
res = ( (col[i] >= col[i+1]) & (col[i+1]>=col[i+2]) for i in (j*3 for j in range(m//3)) )
return not(all(res))
df['D'] = df.apply(condition, axis=1)
df.drop(df[df.D].index)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.