So I'm trying to sort a dataframe based on a randomly generated row. The dataframe is listed below. What I am trying to do is randomly pick a row, which I will call the centroid in the data frame and then make it so that the the rows which are less than the data are above it, and the rows which are greater than the centroid are below it. However I am not sure how to do that, I have given the dataframe and data below as well as the function I use to compare rows. I decide if a row is less than or greater by summing up the values in the row, and comparing it to the sum of the centroid.
Is there a good way to do this?
Any advice is appreciated.
def compareRows(arr1, arr2):
arr1 = sum(arr1)
arr2 = sum(arr2)
return arr1 > arr2
data = np.array(pd.read_csv('https://raw.githubusercontent.com/gsprint23/cpts215/master/progassignments/files/cancer.csv', header=None))
data = data.T
#print(data)
df = pd.DataFrame(data[1:], columns=data[0], dtype=float).T
If you need anymore information please let me know
Thank you for reading
pd.DataFrame.sample
d
without the random rowsampled = df.sample(1)
d = df.drop(sampled.index)
gt = d.apply(compareRows, 1, arr2=sampled.squeeze())
pd.concat([d[~gt], sampled, d[gt]])
# d[~gt].append(sampled).append(d[gt])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.