[英]I wrote this code and received this error. How should I fix this?
import pandas as pd
import numpy as np
import sklearn as preprocessing
country ={'data source':['data','country name','brazil','switzerland','germany','denmark','spain','france','japan','greece','iran','kuwait','morocco','nigeria','qatar','sweden','india','world'],
'unnamed1':['nan','country code','BRA','CHE','DEU','DNK','ESP','FRA','JPN','GRC','IRN','KWT','MAR','NGA','QAT','SWE','IND','WLD'],
'unnamed2':[2016,'population growth',0.817555711,1.077221168,1.193866758,0.834637611,-0.008048086,0.407491036,-0.115284177,-0.687542545,1.1487886,2.924206194,'nan',1.148214693,1.18167997],
'unnamed3':['nan','total population',207652865,8372098,82667685,'nan',46443959,66896109,126994511,10746740,80277428,4052584,35276786,185989640,2569804,9903122,1324171354,7442135578],
'unnamed4':['area(sq.km)',8358140,39516,348900,42262,500210,547557,394560,128900,16287601,'nan',446300,910770,11610,407310,2973190,129733172.7]}
my_df = pd.DataFrame(country, index=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17], columns=['data source','unnamed1','unnamed2','unnamed3','unnamed4'])
print(my_df)
and this is the error:这是错误:
Traceback (most recent call last):
File "c:/Users/se7en/Desktop/AI/skl.py", line 11, in <module>
my_df = pd.DataFrame(country, index=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17], columns=['data source','unnamed1','unnamed2','unnamed3','unnamed4'])
File "C:\Program Files\Python37\lib\site-packages\pandas\core\frame.py", line 614, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 465, in dict_to_mgr
arrays, data_names, index, columns, dtype=dtype, typ=typ, consolidate=copy
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\construction.py", line 136, in arrays_to_mgr
arrays, arr_names, axes, consolidate=consolidate
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1776, in create_block_manager_from_arrays
raise construction_error(len(arrays), arrays[0].shape, axes, e)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1773, in create_block_manager_from_arrays
blocks = _form_blocks(arrays, names, axes, consolidate)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1863, in _form_blocks
items_dict["ObjectBlock"], np.object_, consolidate=consolidate
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1903, in _simple_blockify
values, placement = _stack_arrays(tuples, dtype)
File "C:\Program Files\Python37\lib\site-packages\pandas\core\internals\managers.py", line 1959, in _stack_arrays
stacked[i] = arr
ValueError: could not broadcast input array from shape (15,) into shape (18,)
All the lists/arrays in dictionary must have the same length for the DataFrame constructor to accept the input.字典中的所有列表/数组必须具有相同的长度,以便 DataFrame 构造函数接受输入。
This is not the case with your data:您的数据不是这种情况:
{k:len(v) for k,v in country.items()}
output: output:
{'data source': 18,
'unnamed1': 18,
'unnamed2': 15,
'unnamed3': 18,
'unnamed4': 17}
Either trim the elements to the min length, or pad the shortest ones to the max length.要么将元素修剪到最小长度,要么将最短的元素填充到最大长度。
Another option to circumvent this might be to use a dictionary of Series, which will do the padding job automatically:另一种避免这种情况的方法可能是使用 Series 的字典,它将自动完成填充工作:
df = pd.DataFrame({k:pd.Series(v) for k,v in country.items()})
output: output:
data source unnamed1 unnamed2 unnamed3 unnamed4
0 data nan 2016 nan area(sq.km)
1 country name country code population growth total population 8358140
2 brazil BRA 0.817556 207652865 39516
3 switzerland CHE 1.077221 8372098 348900
4 germany DEU 1.193867 82667685 42262
5 denmark DNK 0.834638 nan 500210
6 spain ESP -0.008048 46443959 547557
7 france FRA 0.407491 66896109 394560
8 japan JPN -0.115284 126994511 128900
9 greece GRC -0.687543 10746740 16287601
10 iran IRN 1.148789 80277428 nan
11 kuwait KWT 2.924206 4052584 446300
12 morocco MAR nan 35276786 910770
13 nigeria NGA 1.148215 185989640 11610
14 qatar QAT 1.18168 2569804 407310
15 sweden SWE NaN 9903122 2973190
16 india IND NaN 1324171354 129733172.7
17 world WLD NaN 7442135578 NaN
NB.注意。 you should clarify the output you expect as it seems here that your lists are mixing labels and data您应该澄清您期望的 output,因为在这里您的列表似乎混合了标签和数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.