简体   繁体   中英

Adding rows to a Pandas dataframe from another dataframe

So I'm trying to sort a dataframe based on a randomly generated row. The dataframe is listed below. What I am trying to do is randomly pick a row, which I will call the centroid in the data frame and then make it so that the the rows which are less than the data are above it, and the rows which are greater than the centroid are below it. However I am not sure how to do that, I have given the dataframe and data below as well as the function I use to compare rows. I decide if a row is less than or greater by summing up the values in the row, and comparing it to the sum of the centroid.

Is there a good way to do this?

Any advice is appreciated.

def compareRows(arr1, arr2):
    arr1 = sum(arr1)
    arr2 = sum(arr2)
    return arr1 > arr2
data = np.array(pd.read_csv('https://raw.githubusercontent.com/gsprint23/cpts215/master/progassignments/files/cancer.csv',  header=None))
    data = data.T
    #print(data)
    df = pd.DataFrame(data[1:], columns=data[0], dtype=float).T

If you need anymore information please let me know

Thank you for reading

  • Grab one row at random with pd.DataFrame.sample
    • note: this returns a one row dataframe
  • create a temporary dataframe d without the random row
  • create a boolean series of truth values that determine which other rows are greater than our random row
  • subset our temporary dataframe by where not greater than, append our random row, append subset of temporary dataframe where greater than our random row

sampled = df.sample(1)
d = df.drop(sampled.index)
gt = d.apply(compareRows, 1, arr2=sampled.squeeze())

pd.concat([d[~gt], sampled, d[gt]])
# d[~gt].append(sampled).append(d[gt])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM