Im trying to import 100 CSV files from this kaggle link - https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data
Each file contains the historical information for a different crypto currency.
Each file looks like this:
for the current analysis I only need the Market Cap Column of each of the files along with the index which is a time stamp.
So for each file the only column needed is the Market Cap. Then I need to append each column & replace the name of the column with the name of the file.
The final result should look like this :
Where each column contains the Market Cap & of coarse each value should correspond with the index, and the name of the column should be the same as the name of the csv file.
any ideas how I can do this?
The following should get you started. This assumes you have a folder containing all the CSV files and that each CSV file has the same format, namely something like:
Date,Open,High,Low,Close,Volume,Market Cap
"Sep 22, 2017",1.23,1.25,1.14,1.24,513898,12916700
"Sep 23, 2017",1.28,1.35,1.18,1.23,1700200,13448400
As you are trying to write data horizontally, you will need to gather all the data into memory before it can be written. This script reads each file one at a time and assigns each row to a dictionary of dictionaries. The first holds the date
and the second contains all the currencies that have an entry for that date. The date
is converted into a datetime
object to ensure that they can be correctly sorted when writing the output CSV file. As each file is read, its name is stored in a set so that a definitive list of currency names is available.
To output, the currency names are sorted and a DictWriter
is used to save all the values. This has the benefit of storing empty values for any missing data:
from collections import defaultdict
from operator import itemgetter
from datetime import datetime
import csv
import glob
import os
req_cols = itemgetter(0, 6)
all_data = defaultdict(dict)
currencies = set()
date_format1 = '%b %d, %Y' # e.g. "Sep 22, 2017"
date_format2 = '%B %d, %Y' # e.g. "June 22, 2017"
for csv_filename in glob.glob('*.csv'):
with open(csv_filename, newline='') as f_input:
currency_name = os.path.splitext(os.path.basename(csv_filename))[0]
csv_input = csv.reader(f_input)
header = next(csv_input)
currencies.add(currency_name)
for row in csv_input:
date, market_cap = req_cols(row)
try:
date = datetime.strptime(date, date_format1)
except ValueError: # Try "June 22, 2017"
date = datetime.strptime(date, date_format2)
all_data[date][currency_name] = market_cap
currencies = sorted(currencies)
with open('output.csv', 'w', newline='') as f_output:
header = ['Date'] + currencies
csv_output = csv.DictWriter(f_output, fieldnames=header)
csv_output.writeheader()
for date, entries in sorted(all_data.items()):
entries['Date'] = date.strftime(date_format1)
csv_output.writerow(entries)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.