[英]List to pandas dataframe
我有一個如下列表:
['[[["3200","house_number"],["northline ave","road"],["ste 360","unit"],["greensboro","city"],["27408","postcode"],["7611","house_number"],["ncus","road"]]]\n',
'[[["1530","house_number"],["jamacha rd","road"],["ste pel","unit"],["ca","road"],["jon","city"],["ca","state"],["92019","postcode"],["us","country"]]]\n',
'[[["625","house_number"],["westport pkwy","road"],["grapevine","city"],["76051","postcode"],["txus","city"]]]\n',
'[[["609 principale stpaul de illeauxnoix quebec ca nadaus","house"]]]\n',
'[[["734","house_number"],["warmiinsterunited states of ameri caus","house"]]]\n',
'[[["595","house_number"],["market street","road"],["suite 2500","unit"],["san francisco","city"],["ca","state"],["94105us","postcode"]]]\n',
'[[["40","house_number"],["first plaza 4 th flooralbuquerque","road"],["87102","house_number"],["nmus","road"]]]\n',
'[[["519","house_number"],["regents gate","road"],["drhenderson","city"],["nv","state"],["89012","postcode"],["us","country"]]]\n',
'[[["400","house_number"],["garden city plz","road"],["ste 510","unit"],["garden city","suburb"],["nyus","city"]]]\n']
我有一個空的數據框(df2):
df2=pd.DataFrame(columns=['house','category','near','house_number','road','unit','level','staircase','entrance','po_box','postcode','suburb','city_district', 'city','island', 'state_district', 'state', 'country_region', 'country', 'world_region'])
我想根據列表中的鍵將列表索引到數據框中,如果它沒有標簽,那么它可以沒有標簽。 我做了這個使用
df = df.reindex(df2.columns, fill_value="")
但是,我收到錯誤消息說它應該有唯一的標簽。 現在從列表中您可以看到道路標簽重復了兩次。 它應該只有一次。 所以我會用相似的鍵連接所有的值,然后重新索引。
請幫助我根據鍵連接值並將其放入預定義的 dataframe-df2 中。
提前致謝。
嘗試這個:
lst = ['[[["3200","house_number"],["northline ave","road"],["ste 360","unit"],["greensboro","city"],["27408","postcode"],["7611","house_number"],["ncus","road"]]]\n',
'[[["1530","house_number"],["jamacha rd","road"],["ste pel","unit"],["ca","road"],["jon","city"],["ca","state"],["92019","postcode"],["us","country"]]]\n',
'[[["625","house_number"],["westport pkwy","road"],["grapevine","city"],["76051","postcode"],["txus","city"]]]\n',
'[[["609 principale stpaul de illeauxnoix quebec ca nadaus","house"]]]\n',
'[[["734","house_number"],["warmiinsterunited states of ameri caus","house"]]]\n',
'[[["595","house_number"],["market street","road"],["suite 2500","unit"],["san francisco","city"],["ca","state"],["94105us","postcode"]]]\n',
'[[["40","house_number"],["first plaza 4 th flooralbuquerque","road"],["87102","house_number"],["nmus","road"]]]\n',
'[[["519","house_number"],["regents gate","road"],["drhenderson","city"],["nv","state"],["89012","postcode"],["us","country"]]]\n',
'[[["400","house_number"],["garden city plz","road"],["ste 510","unit"],["garden city","suburb"],["nyus","city"]]]\n']
cols = ['house','category','near','house_number','road','unit','level','staircase','entrance','po_box','postcode','suburb','city_district', 'city','island', 'state_district', 'state', 'country_region', 'country', 'world_region']
lst = [item for x in lst for item in ast.literal_eval(x)]
df_dict = {idx: {v[1]: v[0] for v in ls} for idx, ls in enumerate(lst)}
df = pd.DataFrame.from_dict(df_dict, orient="index", columns=cols)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.