简体   繁体   English

关于优化一些嵌套 for 循环的非常菜鸟 Python 问题

[英]Very noob Python question about optimising some nested for loops

I am very new to Python, and have just have just got a small piece of code working to compile some user data into a single file.我对 Python陌生,刚刚有了一小段代码,可以将一些用户数据编译成一个文件。 But since I am only learning, I don't just want it to run, but to actually use the functionality Python provides.但由于我只是在学习,我不只是想让它运行,而是要实际使用 Python 提供的功能。 For reference, here is the part of the code I think can be made faster.作为参考,这里是我认为可以做得更快的代码部分。

In short, I have a list of usernames in a text file and usage data for each user over a 4 month period in a CSV for each day.简而言之,我有一个文本文件中的用户名列表,以及每个用户在 4 个月内每天的 CSV 格式的使用数据。

The logic is逻辑是

loop over each CSV:
    loop over each line in that CSV:
         loop over the list of usernames:
               if the username matches, append the user data for that user

The inner two for loops are what I am really focussing on improving, since that is where there is a bit of a bottleneck given the number of users.内部的两个 for 循环是我真正关注改进的地方,因为考虑到用户数量,这里存在一些瓶颈。 In what follows, username_list is a list that I read from the text file and read_csv is a list that is read from the csv file.在下文中,username_list 是我从文本文件中读取的列表,而 read_csv 是从 csv 文件中读取的列表。 The working code is as follows:工作代码如下:

#Initialise a dictionary of lists to store the final data and read the keys (which are the usernames)
main_data = {}
with open(".\\listofusernames.txt") as usernames:
     username_list = usernames.read().splitlines()

for user in username_list:
    main_data[user] = []

#Loop over the CSV files with usage data for each day for 3 months
for i in range(1,91):
    csvdir = f".\\csvfiles\\usagedata_{i}.csv"
    with open(csvdir, 'r') as daily_usage_csv:
         read_csv = list(csv.reader(daily_usage_csv))

#Nested loops over the username list and CSV to get the data for the correct user
for user in username_list:
        for line in read_csv:
             #Username is stored in the first column of CSV so we do an if statement on index 0 of the line
             if line[0] == user:
                 #Usage data is in the second and third column of the CSV, so we append index 1 and 2
                 main_data[user].append(line[1])
                 main_data[user].append(line[2])
                 break

The place where I really want to optimise is in the inner two for loops, over the email list and the lines in the CSV.我真正想要优化的地方是在内部两个 for 循环中,在电子邮件列表和 CSV 中的行上。 I was hoping to do this usings Python's maps() function rather than an explicit for loop.我希望使用 Python 的 maps() 函数而不是显式的 for 循环来做到这一点。 The issue is that the object I am iterating over is not indexing the thing I am appending, so I am not sure how to implement that.问题是我正在迭代的对象没有索引我要附加的东西,所以我不确定如何实现它。

Can anyone give some simple tips for how to make the loops faster here?任何人都可以提供一些简单的技巧来让这里的循环更快吗?

You dont need two for loops.你不需要两个 for 循环。 You need only one, and that is to iterate over the csv for usage data.您只需要一个,那就是遍历 csv 以获取使用数据。 And then check if line[0] ie user is already present in the dictionary main_data .然后检查line[0] ie user是否已经存在于字典main_data And then insert to main_data .然后插入main_data

for line in read_csv:
    if line[0] in main_data:
        main_data[line[0]].append(line[1])
        main_data[line[0]].append(line[2])
        

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM