[英]How can I optimise this Python code?
I want to get the mean of the Year
value of all IndicatorsCode
of every country:我想获得每个国家/地区所有IndicatorsCode
的Year
值的平均值:
import numpy as np
import pandas as pd
datos = pd.read_csv("suramerica.csv")
media = list()
agricultura = list()
flag=0
paises = np.array(['Antigua and Barbuda','Argentina','Chile','Colombia'])
indicadores_agricultura = np.array(['EG.ELC.ACCS.RU.ZS','EG.NSF.ACCS.RU.ZS'])
for i in paises:
for j in indicadores_agricultura:
for k in range(len(datos)):
if i==datos['CountryName'][k] and j==datos['IndicatorCode'][k]:
flag=1
media.append(datos['Year'][k])
if flag==1:
agricultura.append(np.array([i,np.mean(media)]))
del media[:]
flag=0
pd.DataFrame(agricultura,columns=['Paises','Agricultura y Desarrollo Rural'])
Here is a DataFrame of the result:这是结果的数据帧:
If you need access to the csv: Suramerica.csv如果您需要访问 csv: Suramerica.csv
This code takes a long time to execute.这段代码需要很长时间才能执行。 Thanks for your time - any advice will be great.感谢您的时间 - 任何建议都会很棒。
There seems no need to traverse complete data for every combination.似乎没有必要为每个组合遍历完整的数据。 I am using a dict object to save required information.我正在使用 dict 对象来保存所需的信息。 Then calculating np.mean using that.然后使用它计算 np.mean 。 This will greatly enhance the execution speed.这将大大提高执行速度。 Here's code :这是代码:
import numpy as np
import pandas as pd
datos = pd.read_csv("suramerica.csv")
agricultura = list()
output = {}
paises = np.array(['Antigua and Barbuda','Argentina','Chile','Colombia'])
indicadores_agricultura = np.array(['EG.ELC.ACCS.RU.ZS','EG.NSF.ACCS.RU.ZS'])
for k in range(len(datos)):
cn = datos['CountryName'][k]
indicator_code = datos['IndicatorCode'][k]
# change1
if cn not in output.keys():
output[cn] = []
if cn in paises and indicator_code in indicadores_agricultura:
year = datos['Year'][k]
for o in output:
# change2
media = output.get(o)
if not media:
media = 0.0
agricultura.append(np.array([o,np.mean(media)]))
output2 = pd.DataFrame(agricultura,columns=['Paises','Agricultura y Desarrollo Rural'])
print(output2)
I would start writing the loop this way:我会以这种方式开始编写循环:
for k, _ in enumerate(datos):
cn = datos['CountryName'][k]
ic = datos['IndicatorCode'][k]
for i in paises:
if i != cn:
continue
for j in indicadores_agricultura:
if j == ic:
flag = 1
media.append(datos['Year'][k])
if flag:
agricultura.append(np.array([i,np.mean(media)]))
del media[:]
flag = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.