关于优化一些嵌套 for 循环的非常菜鸟 Python 问题

Question

I am very new to Python, and have just have just got a small piece of code working to compile some user data into a single file.我对 Python很陌生，刚刚有了一小段代码，可以将一些用户数据编译成一个文件。 But since I am only learning, I don't just want it to run, but to actually use the functionality Python provides.但由于我只是在学习，我不只是想让它运行，而是要实际使用 Python 提供的功能。 For reference, here is the part of the code I think can be made faster.作为参考，这里是我认为可以做得更快的代码部分。

In short, I have a list of usernames in a text file and usage data for each user over a 4 month period in a CSV for each day.简而言之，我有一个文本文件中的用户名列表，以及每个用户在 4 个月内每天的 CSV 格式的使用数据。

The logic is逻辑是

loop over each CSV:
    loop over each line in that CSV:
         loop over the list of usernames:
               if the username matches, append the user data for that user

The inner two for loops are what I am really focussing on improving, since that is where there is a bit of a bottleneck given the number of users.内部的两个 for 循环是我真正关注改进的地方，因为考虑到用户数量，这里存在一些瓶颈。 In what follows, username_list is a list that I read from the text file and read_csv is a list that is read from the csv file.在下文中，username_list 是我从文本文件中读取的列表，而 read_csv 是从 csv 文件中读取的列表。 The working code is as follows:工作代码如下：

#Initialise a dictionary of lists to store the final data and read the keys (which are the usernames)
main_data = {}
with open(".\\listofusernames.txt") as usernames:
     username_list = usernames.read().splitlines()

for user in username_list:
    main_data[user] = []

#Loop over the CSV files with usage data for each day for 3 months
for i in range(1,91):
    csvdir = f".\\csvfiles\\usagedata_{i}.csv"
    with open(csvdir, 'r') as daily_usage_csv:
         read_csv = list(csv.reader(daily_usage_csv))

#Nested loops over the username list and CSV to get the data for the correct user
for user in username_list:
        for line in read_csv:
             #Username is stored in the first column of CSV so we do an if statement on index 0 of the line
             if line[0] == user:
                 #Usage data is in the second and third column of the CSV, so we append index 1 and 2
                 main_data[user].append(line[1])
                 main_data[user].append(line[2])
                 break

The place where I really want to optimise is in the inner two for loops, over the email list and the lines in the CSV.我真正想要优化的地方是在内部两个 for 循环中，在电子邮件列表和 CSV 中的行上。 I was hoping to do this usings Python's maps() function rather than an explicit for loop.我希望使用 Python 的 maps() 函数而不是显式的 for 循环来做到这一点。 The issue is that the object I am iterating over is not indexing the thing I am appending, so I am not sure how to implement that.问题是我正在迭代的对象没有索引我要附加的东西，所以我不确定如何实现它。

Can anyone give some simple tips for how to make the loops faster here?任何人都可以提供一些简单的技巧来让这里的循环更快吗？

Answer 1

You dont need two for loops.你不需要两个 for 循环。 You need only one, and that is to iterate over the csv for usage data.您只需要一个，那就是遍历 csv 以获取使用数据。 And then check if line[0] ie user is already present in the dictionary main_data .然后检查line[0] ie user是否已经存在于字典main_data 。 And then insert to main_data .然后插入main_data 。

for line in read_csv:
    if line[0] in main_data:
        main_data[line[0]].append(line[1])
        main_data[line[0]].append(line[2])

关于优化一些嵌套 for 循环的非常菜鸟 Python 问题

问题描述

1 个解决方案

解决方案1
0 2021-11-07 04:32:13

关于优化一些嵌套 for 循环的非常菜鸟 Python 问题

问题描述

1 个解决方案

解决方案1 0 2021-11-07 04:32:13

解决方案1
0 2021-11-07 04:32:13