python，multthreading，可以在普通文件上安全使用熊猫“ to_csv”吗？

Question

I've got some code that works pretty nicely. 我有一些很好用的代码。 It's a while-loop that goes through a list of dates, finds files on my HDD that corresponds to those dates, does some calculations with those files, and then outputs to a "results.csv" file using the command: 这是一个while循环，它遍历日期列表，在HDD上查找与这些日期相对应的文件，对这些文件进行一些计算，然后使用以下命令输出到“ results.csv”文件：

my_df.to_csv("results.csv",mode = 'a')

I'm wondering if it's safe to create a new thread for each date, and call the stuff in the while loop on several dates at a time? 我想知道为每个日期创建一个新线程，然后一次在多个日期的while循环中调用这些东西是否安全？

MY CODE: 我的密码：

import datetime, time, os
import sys
import threading
import helperPY #a python file containing the logic I need

class myThread (threading.Thread):
    def __init__(self, threadID, name, counter,sn, m_date):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.counter = counter
        self.sn = sn
        self.m_date = m_date
    def run(self):
        print "Starting " + self.name
        m_runThis(sn, m_date)
        print "Exiting " + self.name

def m_runThis(sn, m_date):
    helperPY.helpFn(sn,m_date)  #this is where the "my_df.to_csv()" is called

sn = 'XXXXXX'
today=datetime.datetime(2016,9,22) #
yesterday=datetime.datetime(2016,6,13) 

threadList = []
i_threadlist=0
while(today>yesterday):
    threadList.append(myThread(i_threadlist, str(today), i_threadlist,sn,today))
    threadList[i_threadlist].start()
    i_threadlist = i_threadlist +1
    today = today-datetime.timedelta(1)

Answer 1

Writing the file in multiple threads is not safe. 在多个线程中写入文件并不安全。 But you can create a lock to protect that one operation while letting the rest run in parallel. 但是您可以创建一个锁来保护该操作，同时让其余操作并行运行。 Your to_csv isn't shown, but you could create the lock 您的to_csv未显示，但是您可以创建锁

csv_output_lock = threading.Lock()

and pass it to helperPY.helpFn . 并将其传递给helperPY.helpFn 。 When you get to the operation, do 当您开始操作时，请执行

with csv_output_lock:
    my_df.to_csv("results.csv",mode = 'a')

You get parallelism for other operations - subject to the GIL of course - but the file access is protected. 您将获得其他操作的并行性-当然要遵守GIL-但文件访问受到保护。

python，multthreading，可以在普通文件上安全使用熊猫“ to_csv”吗？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-09-30 18:05:58

python，multthreading，可以在普通文件上安全使用熊猫“ to_csv”吗？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-09-30 18:05:58

解决方案1
2 已采纳 2016-09-30 18:05:58