python命令字典问题

Question

If I have a CSV file that has a dictionary value for each line (with columns being ["Location"], ["MovieDate"], ["Formatted_Address"], ["Lat"], ["Lng"]), I have been told to use OrderDict if I want to group by Location and append on all the MovieDate values that share the same Location value. 如果我有一个CSV文件，该文件的每一行都有一个字典值（列为[“ Location”]，[“ MovieDate”]，[“ Formatted_Address”]，[“ Lat”]，[“ Lng”]），我如果我想按Location分组并附加在共享相同Location值的所有MovieDate值上，请告诉我使用OrderDict。

ex of data: 数据前：

Location,MovieDate,Formatted_Address,Lat,Lng
    "Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
    "Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

For every row that has the same location (^as in this example), i'd like to make an output like this so that there are no duplicate locations. 对于具有相同位置的每一行（如本示例中的^），我要进行这样的输出，以便没有重复的位置。

 "Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

What's wrong with my code using ordereddict to do this? 我的使用ordereddict执行此操作的代码有什么问题？

from collections import OrderedDict

od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc,rest = row[0], row[1]
        od.setdefault(loc, []).append(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

What I end up with is something like this: 我最终得到的是这样的：

['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']

The issue is that I'm not getting the other columns to show up in this case, how would I best do that? 问题是在这种情况下我没有让其他列显示，我该怎么做呢？ I would also prefer to make the MovieDate values just one long string as here: 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers ' instead of : 我还希望将MovieDate值设置为一个长字符串，例如： 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '而不是：

'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '

thanks guys, appreciate it. 谢谢大家，谢谢。 I'm a python noob. 我是python noob。

Changing row[0], row[1] to row[0], row[1:] unfortunately doesn't give me what I want.. I only want to be adding the values in the second column (MovieDate), not replicating all the other columns as such: 不幸的是row[0], row[1]将row[0], row[1]更改为row[0], row[1:]并没有得到我想要的东西。我只想在第二列（MovieDate）中添加值，而不是复制其他所有列，例如：

['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]

Answer 1

You just need a couple of changes, you need to join the lat and long,to remove the dupe lat and longs we need to also use that as the key: 您只需要进行一些更改，就需要加入经纬度，以删除重复的经纬度，我们还需要将其用作键：

with open("data.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc[0]] + vals+list(loc[1:]))

Output: 输出：

Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

A League of Their Own is first because it comes before the mad,mad line, row[1:-2] gets everything bar the lat,long and location, we store the lat and long in our key tuple to avoid duplicating writing it at the end of each row. A League of Their Own因为它出现在疯狂，疯狂的行， row[1:-2] ，不包括纬度，经度和位置，我们将纬度和经度存储在关键元组中，以避免重复写于每行的末尾。

Using names and unpacking might make it a little easier to follow: 使用名称和拆包可能会使跟踪变得更容易一些：

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc, mov, form, lat, long = row
        od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
    wr.writerow(header)
    for loc, vals in od.items():
        wr.writerow([loc[0]] + vals + list(loc[1:]))

Using csv.Dictwriter to keep five columns: 使用csv.Dictwriter保留五列：

od = OrderedDict()
import csv

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
    wr = csv.DictWriter(out, fieldnames=r.fieldnames)
    for row in r:
        od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
                                        MovieDate=[], Formatted_Address=row["Formatted_Address"]))

        od[row["Location"]]["MovieDate"].append(row["MovieDate"])
    for loc, vals in od.items():
        od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
        wr.writerow(vals)

# Output: ＃输出：

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

So the five columns remain intact, we joined the "MovieDate" into single strings and Formatted_Address=form is always unique so we don't need to update that. 因此，这五列保持不变，我们将"MovieDate"到单个字符串中，并且Formatted_Address=form始终是唯一的，因此我们不需要对其进行更新。

It turns out to match what you wanted all we needed to do was concatenate the MovieDate's and remove duplicate entries for Location, Lat, Lng and 'Formatted_Address' . 事实证明，要匹配您想要做的所有事情，就是串联MovieDate's并删除Location，Lat，Lng和'Formatted_Address'重复条目。

Answer 2

Let's try changing 让我们尝试改变

od.setdefault(loc, []).append(rest)

To 至

od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])

And then keep this as is: 然后保持原样：

wr.writerow([loc]+vals)

Answer 3

Assuming location is the first item of the row: 假设位置是该行的第一项：

dict = {}
for line in f:
    if line[0] not in dict:
        dict[line[0]] = []
    dict[line[0]].append(line[1:])

And for every location, you have the entire rest of the row 对于每个位置，您都拥有整个行的其余部分

for key, value in dict.iteritems():
    out.write(key + value)

python命令字典问题

问题描述

3 个解决方案

解决方案1
1 已采纳 2015-05-25 21:14:09

解决方案2
0

解决方案3
-1 2015-05-25 22:32:23

python命令字典问题

问题描述

3 个解决方案

解决方案1 1 已采纳 2015-05-25 21:14:09

解决方案2 0

解决方案3 -1 2015-05-25 22:32:23

解决方案1
1 已采纳 2015-05-25 21:14:09

解决方案2
0

解决方案3
-1 2015-05-25 22:32:23