简体   繁体   中英

Randomly draw a sample for 2 columns

A well known function for this in Python is random.sample()

However, my dataset consist of multiple columns, and i need the 'lat' and 'lng' coordinates to be sampled. As these two are related, i cannot use the random.sample() separately to get some random lat coordinates + some non corresponding lng coordinates.

What would be the most elegant solution for this?

Perhaps first making a third column, in which i combine lat&lng Then sample Then unmerge?

If so, how should i do this, the fact that both lat and lng values are floats with different lengts doesn't make it easier. Probably by adding a'-' in between?

Essentially, you're talking about sampling an entire row which has values [lat_i, lng_i] . This leads to a very simple (but perhaps too verbose) solution:

random_row_index = random.randint(0, number_of_rows_in_dataset - 1)
random_row = dataset[randon_row_index, :]

If you have a Pandas dataframe, simply use DataFrame.sample .

That is what train_test_split is made for: https://realpython.com/train-test-split-python-data/

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM