简体   繁体   中英

Pandas adding header to the output file after merging multiple CSV files

import pandas as pd
import os

file1 = 'https://public.fyers.in/sym_details/NSE_CM.csv'
file2 = 'https://public.fyers.in/sym_details/NSE_FO.csv'
file3 = 'https://public.fyers.in/sym_details/BSE_CM.csv'
CHUNK_SIZE = 10 ** 6
csv_file_list = [file1, file2, file3]
output_file = "/content/output.csv"

for csv_file_name in csv_file_list:
  skipRows = [2022,92805]
  chunk_container = pd.read_csv(csv_file_name, chunksize=CHUNK_SIZE, skiprows=skipRows)
  for chunk in chunk_container:
    headerList =["fytoken", "symbol", "instrumentType","lotSize","tickSize","ISIN","tradingSession","lastUpdate","expiryDate","symbolTicker","exchange","segment","scripCode","scripName","scripToken","strikePrice","optionType"]
    chunk.to_csv(output_file,header=headerList, mode="a", index=False)

I want to merge the three CSV files and add header to the output file. But it's returning output file at with header at start of each CSV (in the output file).

You are reading the content in chunks and appending the header for each chunk.

Instead, try below:

import pandas as pd

file1 = 'https://public.fyers.in/sym_details/NSE_CM.csv'
file2 = 'https://public.fyers.in/sym_details/NSE_FO.csv'
file3 = 'https://public.fyers.in/sym_details/BSE_CM.csv'
CHUNK_SIZE = 10 ** 6
csv_file_list = [file1, file2, file3]
output_file = "./content/output.csv"

headerList = ["fytoken", "symbol", "instrumentType", "lotSize", "tickSize", "ISIN", "tradingSession",
              "lastUpdate", "expiryDate", "symbolTicker", "exchange", "segment", "scripCode", "scripName",
              "scripToken", "strikePrice", "optionType"]

df = pd.DataFrame(columns=headerList)
df.to_csv(output_file, index=False)

for csv_file_name in csv_file_list:
    skipRows = [2022, 92805]
    with pd.read_csv(csv_file_name, chunksize=CHUNK_SIZE, skiprows=skipRows) as chunk_container:
        for chunk in chunk_container:
            chunk.to_csv(output_file, header=None, mode="a", index=False)

Here we're creating a csv file with only headers beforehand and appending the data reading from above URLs to the same file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM