简体   繁体   English

Python,pandas循环仅在最后一项有效

[英]Python, pandas loops works only in the last item

edit : my apology : I find the issue, I am using Rstudio to run the code, which mess up something, I just try it from the console and it is working fine 编辑:我的道歉:我发现了问题,我正在使用Rstudio运行代码,弄乱了一些东西,我只是从控制台尝试了一下,效果很好

I know, I am doing something silly but I can't figure out what I am doing wrong, I built this script that read a zip file, do some transformation and write a final csv, but for some reason, only the last file got written the script is fully reproducible, the source files are located in the link below if you want to try and debug it. 我知道,我在做一些愚蠢的事情,但是我无法弄清楚我在做什么错,我构建了这个脚本,可以读取zip文件,进行一些转换并编写最终的csv,但是由于某种原因,只有最后一个文件可以使用编写的脚本是完全可复制的,如果您想尝试和调试它,源文件位于下面的链接中。

files = os.listdir(os.curdir)
files = [i for i in files if i.endswith('.zip')]
print(files)
for x in files:
     path_file = os.path.join(curDir ,x)
     print(path_file)
     source = pd.read_csv(path_file,
     skiprows=1,
     usecols=["DISPATCH","1" ,"SETTLEMENTDATE", "RUNNO","INTERVENTION","CASESUBTYPE","SOLUTIONSTATUS","NONPHYSICALLOSSES"],
     dtype=str)

     source.rename(columns={'1': 'version'}, inplace=True)
     source.query('version=="2"')

      ################ Extract UNIT, SETTLEMENTDATE,DUID,INITIALMW AND EXPORT TO CSV
     df_unit=source
     df_unit=df_unit.query('DISPATCH=="DUNIT" or DISPATCH=="TUNIT"')
     #Make first row a header
     df_unit.columns = df_unit.iloc[0]
     df_unit = df_unit[1:]
     #create a conditional column
     df_unit.loc[df_unit['DUNIT'] == 'TUNIT', 'INITIALMW1'] = df_unit['INTERVENTION']
     df_unit.loc[df_unit['DUNIT'] == 'DUNIT', 'INITIALMW1'] = df_unit['INITIALMW']
     df_unit.drop(columns=['RUNNO','2','INTERVENTION','INITIALMW','DISPATCHMODE'],inplace=True)
     df_unit.rename(columns={'INITIALMW1': 'INITIALMW','DUNIT': 'UNIT'}, inplace=True)
     df_unit=df_unit.query('SETTLEMENTDATE!="SETTLEMENTDATE" and INITIALMW !="0"')
     df_unit["INITIALMW"] = pd.to_numeric(df_unit["INITIALMW"])
     df_unit['SETTLEMENTDATE']=pd.to_datetime(df_unit['SETTLEMENTDATE'])
     df_unit.head()
     df_unit.to_csv(x.rsplit('.', 1)[0] + '.csv',float_format="%.4f",
     index=False,date_format='%Y-%m-%dT%H:%M:%S.%fZ',compression='gzip')
     print(path_file) 

EDIT: I added list files: 编辑:我添加了列表文件:

['PUBLIC_DAILY_201906040000_20190605040502.zip', 'PUBLIC_DAILY_201906050000_20190606040501.zip', 'PUBLIC_DAILY_201907140000_20190715040502.zip'] ['PUBLIC_DAILY_201906040000_20190605040502.zip','PUBLIC_DAILY_201906050000_20190606040501.zip','PUBLIC_DAILY_201907140000_20190715040502.zip']

The files are downloaded from here . 文件是从这里下载的。

The below code is working for me. 下面的代码为我工作。 Probably you only have one downloaded zip file in the current directory, or actually you are invoking jupyter notebook or python script from the incorrect directory . 可能您在当前目录中只有一个下载的zip文件,或者实际上您是从错误目录调用jupyter notebook或python脚本 You can print os.getcwd() . 您可以打印os.getcwd() Otherwise, there is nothing wrong with the code. 否则,代码没有错。 All the zip files has to be in the same directory from where you this code is getting run either through python script or jupyter notebook. 所有zip文件必须位于通过python脚本或jupyter notebook运行此代码的相同目录中。

files = os.listdir(os.getcwd())
files = [i for i in files if i.endswith('.zip')]
print(files)
for x in files:
    path_file = os.path.join(os.getcwd() ,x)
    print(path_file)
    ...
    ... 
    df_unit.to_csv(x.rsplit('.', 1)[0] + '.csv',float_format="%.4f",
    index=False,date_format='%Y-%m-%dT%H:%M:%S.%fZ',compression='gzip')
    print(path_file) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM