简体   繁体   中英

Create dataframe where column is a list of tuples

I'm trying to create a list of tuples within a dataframe. Using code below:

# creating the Numpy array
array = np.array([[('A' , 1)], [('B' , 2)]])
  
# creating a list of index names
index_values = ['x1', 'x2']
   
# creating a list of column names
column_values = ['(a,b)']
  
# creating the dataframe
df = pd.DataFrame(data = array, 
                  index = index_values, 
                  columns = column_values)
  
df

returns:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_45/2020978637.py in <module>
     13 df = pd.DataFrame(data = array, 
     14                   index = index_values,
---> 15                   columns = column_values)
     16 
     17 df

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    676                     dtype=dtype,
    677                     copy=copy,
--> 678                     typ=manager,
    679                 )
    680 

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    302         # by definition an array here
    303         # the dtypes will be coerced to a single dtype
--> 304         values = _prep_ndarray(values, copy=copy)
    305 
    306     if dtype is not None and not is_dtype_equal(values.dtype, dtype):

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
    553         values = values.reshape((values.shape[0], 1))
    554     elif values.ndim != 2:
--> 555         raise ValueError(f"Must pass 2-d input. shape={values.shape}")
    556 
    557     return values

ValueError: Must pass 2-d input. shape=(2, 1, 2)

Using a single element tuple:

array = np.array([[(1)], [(2)]])

在此处输入图像描述

The way you are creating the numpy array is wrong. Since it is an array of tuples, you will have to specify the dtype of the elements of the tuple while creating the array, and then later cast it back to an object type using astype(object) .

Do the following -

array = np.array([[('A',1)], [('B',2)]], dtype=('<U10,int')).astype(object)

index_values = ['x1', 'x2']

column_values = ['(a,b)']

df = pd.DataFrame(data = array, index = index_values, columns = column_values)

Output:

>>> df
     (a,b)
x1  (A, 1)
x2  (B, 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM