[英]Create dataframe where column is a list of tuples
I'm trying to create a list of tuples within a dataframe. Using code below:我正在尝试在 dataframe 中创建元组列表。使用以下代码:
# creating the Numpy array
array = np.array([[('A' , 1)], [('B' , 2)]])
# creating a list of index names
index_values = ['x1', 'x2']
# creating a list of column names
column_values = ['(a,b)']
# creating the dataframe
df = pd.DataFrame(data = array,
index = index_values,
columns = column_values)
df
returns:回报:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_45/2020978637.py in <module>
13 df = pd.DataFrame(data = array,
14 index = index_values,
---> 15 columns = column_values)
16
17 df
/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
676 dtype=dtype,
677 copy=copy,
--> 678 typ=manager,
679 )
680
/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
302 # by definition an array here
303 # the dtypes will be coerced to a single dtype
--> 304 values = _prep_ndarray(values, copy=copy)
305
306 if dtype is not None and not is_dtype_equal(values.dtype, dtype):
/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
553 values = values.reshape((values.shape[0], 1))
554 elif values.ndim != 2:
--> 555 raise ValueError(f"Must pass 2-d input. shape={values.shape}")
556
557 return values
ValueError: Must pass 2-d input. shape=(2, 1, 2)
Using a single element tuple:使用单个元素元组:
array = np.array([[(1)], [(2)]])
The way you are creating the numpy array is wrong.您创建 numpy 数组的方式是错误的。 Since it is an array of tuples, you will have to specify the
dtype
of the elements of the tuple while creating the array, and then later cast it back to an object type using astype(object)
.由于它是元组数组,因此您必须在创建数组时指定元组元素的数据类型,然后使用
dtype
astype(object)
将其转换回 object 类型。
Do the following -请执行下列操作 -
array = np.array([[('A',1)], [('B',2)]], dtype=('<U10,int')).astype(object)
index_values = ['x1', 'x2']
column_values = ['(a,b)']
df = pd.DataFrame(data = array, index = index_values, columns = column_values)
Output: Output:
>>> df
(a,b)
x1 (A, 1)
x2 (B, 2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.