简体   繁体   English

python,multthreading,可以在普通文件上安全使用熊猫“ to_csv”吗?

[英]python, multthreading, safe to use pandas “to_csv” on common file?

I've got some code that works pretty nicely. 我有一些很好用的代码。 It's a while-loop that goes through a list of dates, finds files on my HDD that corresponds to those dates, does some calculations with those files, and then outputs to a "results.csv" file using the command: 这是一个while循环,它遍历日期列表,在HDD上查找与这些日期相对应的文件,对这些文件进行一些计算,然后使用以下命令输出到“ results.csv”文件:

my_df.to_csv("results.csv",mode = 'a')

I'm wondering if it's safe to create a new thread for each date, and call the stuff in the while loop on several dates at a time? 我想知道为每个日期创建一个新线程,然后一次在多个日期的while循环中调用这些东西是否安全?

MY CODE: 我的密码:

import datetime, time, os
import sys
import threading
import helperPY #a python file containing the logic I need

class myThread (threading.Thread):
    def __init__(self, threadID, name, counter,sn, m_date):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.counter = counter
        self.sn = sn
        self.m_date = m_date
    def run(self):
        print "Starting " + self.name
        m_runThis(sn, m_date)
        print "Exiting " + self.name

def m_runThis(sn, m_date):
    helperPY.helpFn(sn,m_date)  #this is where the "my_df.to_csv()" is called

sn = 'XXXXXX'
today=datetime.datetime(2016,9,22) #
yesterday=datetime.datetime(2016,6,13) 

threadList = []
i_threadlist=0
while(today>yesterday):
    threadList.append(myThread(i_threadlist, str(today), i_threadlist,sn,today))
    threadList[i_threadlist].start()
    i_threadlist = i_threadlist +1
    today = today-datetime.timedelta(1)

Writing the file in multiple threads is not safe. 在多个线程中写入文件并不安全。 But you can create a lock to protect that one operation while letting the rest run in parallel. 但是您可以创建一个来保护该操作,同时让其余操作并行运行。 Your to_csv isn't shown, but you could create the lock 您的to_csv未显示,但是您可以创建锁

csv_output_lock = threading.Lock()

and pass it to helperPY.helpFn . 并将其传递给helperPY.helpFn When you get to the operation, do 当您开始操作时,请执行

with csv_output_lock:
    my_df.to_csv("results.csv",mode = 'a')

You get parallelism for other operations - subject to the GIL of course - but the file access is protected. 您将获得其他操作的并行性-当然要遵守GIL-但文件访问受到保护。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM