简体   繁体   中英

Import multiple CSV files, select 1 Column from each file & rename the column with the file name in Jupyter Noteboos

Im trying to import 100 CSV files from this kaggle link - https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data

Each file contains the historical information for a different crypto currency.

Each file looks like this:

How data looks like

for the current analysis I only need the Market Cap Column of each of the files along with the index which is a time stamp.

So for each file the only column needed is the Market Cap. Then I need to append each column & replace the name of the column with the name of the file.

The final result should look like this :

Final Result

Where each column contains the Market Cap & of coarse each value should correspond with the index, and the name of the column should be the same as the name of the csv file.

any ideas how I can do this?

The following should get you started. This assumes you have a folder containing all the CSV files and that each CSV file has the same format, namely something like:

Date,Open,High,Low,Close,Volume,Market Cap
"Sep 22, 2017",1.23,1.25,1.14,1.24,513898,12916700
"Sep 23, 2017",1.28,1.35,1.18,1.23,1700200,13448400

As you are trying to write data horizontally, you will need to gather all the data into memory before it can be written. This script reads each file one at a time and assigns each row to a dictionary of dictionaries. The first holds the date and the second contains all the currencies that have an entry for that date. The date is converted into a datetime object to ensure that they can be correctly sorted when writing the output CSV file. As each file is read, its name is stored in a set so that a definitive list of currency names is available.

To output, the currency names are sorted and a DictWriter is used to save all the values. This has the benefit of storing empty values for any missing data:

from collections import defaultdict
from operator import itemgetter
from datetime import datetime
import csv
import glob
import os

req_cols = itemgetter(0, 6)
all_data = defaultdict(dict)
currencies = set()
date_format1 = '%b %d, %Y'  # e.g. "Sep 22, 2017"
date_format2 = '%B %d, %Y'  # e.g. "June 22, 2017"

for csv_filename in glob.glob('*.csv'):
    with open(csv_filename, newline='') as f_input:
        currency_name = os.path.splitext(os.path.basename(csv_filename))[0]
        csv_input = csv.reader(f_input)
        header = next(csv_input)
        currencies.add(currency_name)

        for row in csv_input:
            date, market_cap = req_cols(row)

            try:
                date = datetime.strptime(date, date_format1)
            except ValueError:      # Try "June 22, 2017"
                date = datetime.strptime(date, date_format2)

            all_data[date][currency_name] = market_cap

currencies = sorted(currencies)

with open('output.csv', 'w', newline='') as f_output:
    header = ['Date'] + currencies
    csv_output = csv.DictWriter(f_output, fieldnames=header)
    csv_output.writeheader()

    for date, entries in sorted(all_data.items()):
        entries['Date'] = date.strftime(date_format1)
        csv_output.writerow(entries)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM