[英]multiprocessing in python (going from a for loop to multiprocessing for loop)
I have a script that works. 我有一个有效的脚本。 It has a for loop that id like to improve the speed of by incorporating multiprocessing.
它有一个for循环,它喜欢通过合并多处理来提高速度。
The code without multiprocessing is as follows: 没有多处理的代码如下:
Symbol= Symbol[0:] #slicing to coose which stocks to look at
################################for loop
for item in Symbol:
print item
try:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
tickerlistori.append(item)
valuemax = max(serious2)
indexmax = serious2.index(max(serious2))
valuemin = min(serious2)
indexmin = serious2.index(min(serious2))
pricecurrent = serious2[-1]
if valuemax>30 and valuemin<2 and pricecurrent<2.5:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
except RemoteDataError:
pass
print tickerlist
The second code block below is "with parallel processing" 下面的第二个代码块是“具有并行处理”
Symbol= Symbol[0:] #slicing to coose which stocks to look at
############ multi processing before the for loop
def search1(Symbol):
for item in Symbol:
print item #trying to see why the tickers are messed up
try:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
tickerlistori.append(item)
valuemax = max(serious2)
indexmax = serious2.index(max(serious2))
valuemin = min(serious2)
indexmin = serious2.index(min(serious2))
pricecurrent = serious2[-1]
if valuemax>30 and valuemin<2 and pricecurrent<2.5:
tickerlist.append(item)
maxpricelist.append(valuemax)
minpricelist.append(valuemin)
except RemoteDataError:
pass
pool = Pool(processes=4)
tickerlist = pool.map(search1, Symbol)
print tickerlist
The first one works fine but the second, although the code does run without error, the Symbol that gets fed into pool.map(search1, Symbol)
doesn't seem right. 第一个可以正常工作,但是第二个可以正常运行,尽管代码可以正确运行,但是输入
pool.map(search1, Symbol)
看起来不正确。
Thanks in advance. 提前致谢。
(Symbol is just supposed to be a list of stock tickers) (符号只是股票行情清单)
import matplotlib.pyplot as plt
import csv
import pandas as pd
import datetime
import pandas.io.data as web
from pandas.io.data import DataReader, SymbolWarning, RemoteDataError
from filesortfunct import filesort
from scipy import stats
from scipy.stats.stats import pearsonr
import numpy as np
import math
from multiprocessing import Pool
import warnings
warnings.filterwarnings("ignore")
#decide the two dates between which to look at stock prices
start = datetime.datetime.strptime('2/10/2015', '%m/%d/%Y')
end = datetime.datetime.strptime('2/25/2016', '%m/%d/%Y')
#intended to collect indeces and min/max prices
#global tickerlist, maxpricelist, minpricelist, tickerlistori
tickerlistori=[] #list of stocks available from google finance
tickerlist=[]
maxpricelist = []
minpricelist =[]
datanamelist= ['NYSE.csv']#,'NASDAQ.csv','AMEX.csv']
for each in datanamelist:
#print each #print out which stock exchange is being looked at
dataname= each #csv file from which to extract stock tickers
new = 'new'
df = pd.read_csv(dataname, sep=',')
df = df[['Symbol']]
df.to_csv(new+dataname, sep=',', index=False)
x=open(new+dataname,'rb') #convert it into a form more managable
f = csv.reader(x) # csv is binary
Symbol = zip(*f)
#print type(Symbol) #list format
Symbol=Symbol[0] #pick out the first column
# Symbol = Symbol[1:len(Symbol)] #remove the first row "symbol" header
Symbol = Symbol[3210:len(Symbol)]
Symbol= Symbol[0:] #slicing to coose which stocks to look at
#print Symbol
def search1(item):
print item #trying to see why the tickers are messed up
try:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
valuemax = max(serious2)
indexmax = serious2.index(max(serious2))
valuemin = min(serious2)
indexmin = serious2.index(min(serious2))
pricecurrent = serious2[-1]
if valuemax>30 and valuemin<2 and pricecurrent<2.5:
return item, valuemax, valuemin
except RemoteDataError:
pass
pool = Pool(processes=4)
pool.start()
for result in pool.map(search1, Symbol):
if result:
tickerlist.append(result[0])
maxpricelist.append(result[1])
minpricelist.append(result[2])
print tickerlist
You've got several problems: 您有几个问题:
map
will enumerate Symbol
and run the worker for each. map
将枚举Symbol
并为每个Symbol
运行worker。 The worker doesn't need to enumerate it again in a for loop Here's an update 这是更新
Symbol= Symbol[0:] #slicing to coose which stocks to look at
############ multi processing before the for loop
def search1(item):
print item #trying to see why the tickers are messed up
try:
serious=web.DataReader([item], 'yahoo', start, end)['Adj Close']
serious2=serious.loc[:, item].tolist() #extract the column of 'Adj Close'
valuemax = max(serious2)
indexmax = serious2.index(max(serious2))
valuemin = min(serious2)
indexmin = serious2.index(min(serious2))
pricecurrent = serious2[-1]
if valuemax>30 and valuemin<2 and pricecurrent<2.5:
return item, valuemax, valuemin
except RemoteDataError:
pass
pool = Pool(processes=4)
for result in pool.map(search1, Symbol):
if result:
tickerlist.append(result[0])
maxpricelist.append(result[1])
minpricelist.append(result[2])
print tickerlist
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.