简体   繁体   中英

How to convert a pandas Series of lists or tuples to a Series of numpy arrays

I have a csv file with x, y, and z columns that represent coordinates in a 3-dimensional space. I need to create a distance matrix from each item over all other items.

I can easily read the csv with pandas read_csv function, resulting in a DataFrame like the following:

import pandas as pd
import numpy as np

samples = pd.DataFrame(
    columns=['source', 'name', 'x', 'y', 'z'],
    data = [['a', 'apple', 1.0, 2.0, 3.0],
            ['b', 'pear', 2.0, 3.0, 4.0],
            ['c', 'tomato', 9.0, 8.0, 7.0],
            ['d', 'sandwich', 6.0, 5.0, 4.0]]
)

I can then convert the separate x, y, z columns into a Series of tuples:

samples['coord'] = samples.apply(
    lambda row: (row['x'], row['y'], row['z']),
    axis=1
)

or a Series of lists:

samples['coord'] = samples.apply(
    lambda row: [row['x'], row['y'], row['z']],
    axis=1
)

But I cannot create a Series of arrays:

samples['coord'] = samples.apply(
    lambda row: np.array([row['x'], row['y'], row['z']]),
    axis=1
)

I get the ValueError, "Shape of passed values is (4,3), indices imply (4,6)"

I'd really like to have the data prepped so that I can simply call the scipy's distance_matrix function, which expects two arrays, as follows:

dmat = scipy.spatial.distance_matrix(
    samples['coord'].values,
    samples['coord'].values
)

I am, of course, open to any more pythonic or more efficient way to achieve this goal if my approach is poor.

This stores NumPy array in coords :

samples['coord'] = list(samples[['x', 'y', 'z']].values)

Now:

>>> samples.coord[0]
array([ 1.,  2.,  3.])

I figured out that I can just extract a numpy array from the dataframe and use it to get the distance matrix.

sample_array = np.array(samples[['x', 'y', 'z']])
dmat = scipy.spatial.distance_matrix(sample_array, sample_array)

But I'd still like to have those little arrays embedded in the dataframe, alongside the other data, and I'd upvote and accept an answer that can do that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM