![](/img/trans.png)
[英]Python: make an empty issue of dictionary with keys from existing ordered dict
[英]python ordered dict issue
如果我有一個CSV文件,該文件的每一行都有一個字典值(列為[“ Location”],[“ MovieDate”],[“ Formatted_Address”],[“ Lat”],[“ Lng”]),我如果我想按Location
分組並附加在共享相同Location
值的所有MovieDate
值上,請告訴我使用OrderDict。
數據前:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
對於具有相同位置的每一行(如本示例中的^),我要進行這樣的輸出,以便沒有重復的位置。
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
我的使用ordereddict執行此操作的代碼有什么問題?
from collections import OrderedDict
od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc,rest = row[0], row[1]
od.setdefault(loc, []).append(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
我最終得到的是這樣的:
['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']
問題是在這種情況下我沒有讓其他列顯示,我該怎么做呢? 我還希望將MovieDate值設置為一個長字符串,例如: 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '
而不是:
'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '
謝謝大家,謝謝。 我是python noob。
不幸的是row[0], row[1]
將row[0], row[1]
更改為row[0], row[1:]
並沒有得到我想要的東西。我只想在第二列(MovieDate)中添加值,而不是復制其他所有列,例如:
['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
您只需要進行一些更改,就需要加入經緯度,以刪除重復的經緯度,我們還需要將其用作鍵:
with open("data.csv") as f,open("new.csv" ,"w") as out:
r = csv.reader(f)
wr= csv.writer(out)
header = next(r)
for row in r:
od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc[0]] + vals+list(loc[1:]))
輸出:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
A League of Their Own
因為它出現在瘋狂,瘋狂的行, row[1:-2]
,不包括緯度,經度和位置,我們將緯度和經度存儲在關鍵元組中,以避免重復寫於每行的末尾。
使用名稱和拆包可能會使跟蹤變得更容易一些:
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc, mov, form, lat, long = row
od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
wr.writerow(header)
for loc, vals in od.items():
wr.writerow([loc[0]] + vals + list(loc[1:]))
使用csv.Dictwriter保留五列:
od = OrderedDict()
import csv
with open("data.csv") as f, open("new.csv", "w") as out:
r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
wr = csv.DictWriter(out, fieldnames=r.fieldnames)
for row in r:
od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
MovieDate=[], Formatted_Address=row["Formatted_Address"]))
od[row["Location"]]["MovieDate"].append(row["MovieDate"])
for loc, vals in od.items():
od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
wr.writerow(vals)
#輸出:
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
因此,這五列保持不變,我們將"MovieDate"
到單個字符串中,並且Formatted_Address=form
始終是唯一的,因此我們不需要對其進行更新。
事實證明,要匹配您想要做的所有事情,就是串聯MovieDate's
並刪除Location,Lat,Lng和'Formatted_Address'
重復條目。
讓我們嘗試改變
od.setdefault(loc, []).append(rest)
至
od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])
然后保持原樣:
wr.writerow([loc]+vals)
假設位置是該行的第一項:
dict = {}
for line in f:
if line[0] not in dict:
dict[line[0]] = []
dict[line[0]].append(line[1:])
對於每個位置,您都擁有整個行的其余部分
for key, value in dict.iteritems():
out.write(key + value)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.