简体   繁体   English

根据csv名称在Pandas中读取和输出csv文件

[英]Read and output csv file in Pandas based on csv name

I am trying to write a python code that will automatically read csv files from certain folder and save the final csv with a name depending on original csv names.我正在尝试编写一个 python 代码,它会自动从某个文件夹读取 csv 文件,并根据原始 csv 名称使用名称保存最终的 csv。 For instance: In the folder "Torque and Drag" I have 4 raw csv files:例如:在文件夹“Torque and Drag”中,我有 4 个原始 csv 文件:

PU 30 0.15 0.25 1.35 0.csv, PU 30 0.15 0.25 1.35 0.csv,
PU 0 0.15 0.25 1.35 0.csv, PU 0 0.15 0.25 1.35 0.csv,
DA 30 0.15 0.25 1.35 8.csv, DA 30 0.15 0.25 1.35 8.csv,
DA 0 0.15 0.25 1.35 8.csv. DA 0 0.15 0.25 1.35 8.csv。

I want my function in python to look into this folder and select the csv files starting with PU, put them in the same dataframe and output as a single final csv.我希望我在 python 中的函数查看这个文件夹并选择以 PU 开头的 csv 文件,将它们放在同一个数据帧中并作为单个最终 csv 输出。 The same with csv files starting with DA.以 DA 开头的 csv 文件也是如此。 The folder "Torque and Drag" will be constantly updated with similar csv files and I might have 20 csv files starting with PU and 20 files starting with DA, and I want my function to output only two final csv files (1 for PU and 1 for DA meaning; PU.csv will contain my 20 csv file data appended one after another and same for DA.csv).文件夹“Torque and Drag”将使用类似的 csv 文件不断更新,我可能有 20 个以 PU 开头的 csv 文件和 20 个以 DA 开头的文件,我希望我的函数只输出两个最终的 csv 文件(1 个用于 PU,1 个用于对于 DA 的含义;PU.csv 将包含我的 20 个 csv 文件数据,一个接一个附加,DA.csv 相同)。 I want the headers only once and then skip it as other csvs are appended.我只想要标题一次,然后在附加其他 csv 时跳过它。 I wrote the following code, but it only outputs 1 csv file which contains all of the csvs in that folder (PU and DA) and not separating them based on the initial two letters of the name of the csv.我编写了以下代码,但它只输出 1 个 csv 文件,其中包含该文件夹(PU 和 DA)中的所有 csv,而不是根据 csv 名称的前两个字母将它们分开。 I couldn't handle the condition where I could indicate the initial two letter of the csv name.我无法处理可以指示 csv 名称的前两个字母的情况。

import pandas as pd
import os
import glob

def csv_upload():
   folder_name = 'Torque and Drag' #write your folder name where all csvs will be kept
   file_type = 'csv'
   seperator =','

   path = os.getcwd()
   csv_files = glob.glob(os.path.join(path+ "/" + folder_name, "*.csv"))
   for file in csv_files:
      csv_name = file.split("\\")[-1]

   name = csv_name.split(" ")[0]
   initials = f"{name}.csv"
   

   df = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type) if name=='PU'],
               ignore_index=True)

   df.to_csv(f'{name}.csv',mode='a',header=True,index=False)

I wrote this code, and it works ok, but it doesn't look optimal to me because I am repeating the operations for each csv output, how can I eliminate that and make it more optimal?我写了这段代码,它工作正常,但对我来说它看起来不是最佳的,因为我正在为每个 csv 输出重复操作,我怎样才能消除它并使其更优化?

import pandas as pd
import os
import glob

def csv_upload():
   folder_name = 'Torque and Drag' #write your folder name where all csvs will be kept
   file_type = 'csv'
   seperator =','
   path = os.getcwd()
   csv_files = glob.glob(os.path.join(path + "/" + folder_name, "*.csv"))

   for file in csv_files:
      csv_name = file.split("\\")[-1]
      name = csv_name.split(" ")[0]
      initials = f"{name}.csv"
    
   PU_df = pd.concat([pd.read_csv(f, sep=seperator) for f in csv_files if f.split("\\")[-1][0:2]=='PU'], ignore_index=True)
   DA_df = pd.concat([pd.read_csv(f, sep=seperator) for f in csv_files if f.split("\\")[-1][0:2]=='DA'], ignore_index=True)
    
   PU_df.rename(columns={'MD\r\n[m]':"MD (m)",'Hole size\r\n[inch]':'Hole size (inch)','Tool OD\r\n[inch]':'Tool OD (inch)',
                       'Tool ID\r\n[inch]':'Tool ID (inch)',
                       'Weight\r\n[kg]':'Weight (kg)','Inc\r\n[deg]':'Inc (deg)',
                       'Dir\r\n[deg Az]':'Dir (deg Az)',
                      'Friction\r\nFactor':'Friction (nFactor)',
                      'Tension limit\r\n[kg]':'Tension limit (kg)',
                      'Torsion limit\r\n[kN.m]':'Torsion limit (kN.m)',
                      'F Sin.\r\n[kg]':'F Sin (kg)','F Hel.\r\n[kg]':'F Hel (kg)',
                      'F normal\r\n[kg]':'F normal (kg)','Tension\r\n[kg]':'Tension (kg)',
                      'Torque\r\n[kN.m]':'Torque, (kN.m)'}, inplace = True)

   DA_df.rename(columns={'MD\r\n[m]':"MD (m)",'Hole size\r\n[inch]':'Hole size (inch)','Tool OD\r\n[inch]':'Tool OD (inch)',
                       'Tool ID\r\n[inch]':'Tool ID (inch)',
                       'Weight\r\n[kg]':'Weight (kg)','Inc\r\n[deg]':'Inc (deg)',
                       'Dir\r\n[deg Az]':'Dir (deg Az)',
                      'Friction\r\nFactor':'Friction (nFactor)',
                      'Tension limit\r\n[kg]':'Tension limit (kg)',
                      'Torsion limit\r\n[kN.m]':'Torsion limit (kN.m)',
                      'F Sin.\r\n[kg]':'F Sin (kg)','F Hel.\r\n[kg]':'F Hel (kg)',
                      'F normal\r\n[kg]':'F normal (kg)','Tension\r\n[kg]':'Tension (kg)',
                      'Torque\r\n[kN.m]':'Torque, (kN.m)'}, inplace = True)

    maxMD = PU_df['MD (m)'].max()
    PU_df['Depth (m)'] = maxMD - PU_df['MD (m)']
    PU_df['State'] = csv_name.split(" ")[0]
    PU_df['RPM'] = csv_name.split(" ")[1]
    PU_df['Casing Friction Factor'] = csv_name.split(" ")[2]
    PU_df['Open Hole Friction Factor'] = csv_name.split(" ")[3]
    PU_df['MW'] = csv_name.split(" ")[4]
    PU_df['WOB'] = csv_name.split(" ")[5][0]

    maxMD = DA_df['MD (m)'].max()
    DA_df['Depth (m)'] = maxMD - DA_df['MD (m)']
    DA_df['State'] = csv_name.split(" ")[0]
    DA_df['RPM'] = csv_name.split(" ")[1]
    DA_df['Casing Friction Factor'] = csv_name.split(" ")[2]
    DA_df['Open Hole Friction Factor'] = csv_name.split(" ")[3]
    DA_df['MW'] = csv_name.split(" ")[4]
    DA_df['WOB'] = csv_name.split(" ")[5][0]

    PU_df.to_csv('PU.csv',mode='a',header=True,index=False)
    DA_df.to_csv('DA.csv',mode='a',header=True,index=False)

csv_upload() csv_upload()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM