[英]Import multiple CSV files, select 1 Column from each file & rename the column with the file name in Jupyter Noteboos
Im trying to import 100 CSV files from this kaggle link - https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data 我试着从这个kaggle链接导入CSV 100个文件- https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data
Each file contains the historical information for a different crypto currency. 每个文件都包含不同加密货币的历史信息。
Each file looks like this: 每个文件如下所示:
for the current analysis I only need the Market Cap Column of each of the files along with the index which is a time stamp. 对于当前的分析,我只需要每个文件的市值列以及带有时间戳的索引。
So for each file the only column needed is the Market Cap. 因此,对于每个文件,唯一需要的列就是市值。 Then I need to append each column & replace the name of the column with the name of the file.
然后,我需要追加每列并用文件名替换该列的名称。
The final result should look like this : 最终结果应如下所示:
Where each column contains the Market Cap & of coarse each value should correspond with the index, and the name of the column should be the same as the name of the csv file. 如果每列包含市值和粗略值,则每个值应与索引相对应,并且该列的名称应与csv文件的名称相同。
any ideas how I can do this? 有什么想法我该怎么做?
The following should get you started. 以下内容将帮助您入门。 This assumes you have a folder containing all the CSV files and that each CSV file has the same format, namely something like:
假设您有一个包含所有CSV文件的文件夹,并且每个CSV文件都具有相同的格式,即:
Date,Open,High,Low,Close,Volume,Market Cap
"Sep 22, 2017",1.23,1.25,1.14,1.24,513898,12916700
"Sep 23, 2017",1.28,1.35,1.18,1.23,1700200,13448400
As you are trying to write data horizontally, you will need to gather all the data into memory before it can be written. 在尝试水平写入数据时,需要先将所有数据收集到内存中,然后才能进行写入。 This script reads each file one at a time and assigns each row to a dictionary of dictionaries.
该脚本一次读取每个文件,并将每一行分配给字典的字典。 The first holds the
date
and the second contains all the currencies that have an entry for that date. 第一个包含
date
,第二个包含具有该日期条目的所有货币。 The date
is converted into a datetime
object to ensure that they can be correctly sorted when writing the output CSV file. 该
date
将转换为datetime
对象,以确保在写入输出CSV文件时可以对它们进行正确排序。 As each file is read, its name is stored in a set so that a definitive list of currency names is available. 读取每个文件时,其名称存储在一个集中,以便可以使用货币名称的确定列表。
To output, the currency names are sorted and a DictWriter
is used to save all the values. 为了输出,货币名称被排序,并且
DictWriter
用于保存所有值。 This has the benefit of storing empty values for any missing data: 这具有为任何丢失的数据存储空值的好处:
from collections import defaultdict
from operator import itemgetter
from datetime import datetime
import csv
import glob
import os
req_cols = itemgetter(0, 6)
all_data = defaultdict(dict)
currencies = set()
date_format1 = '%b %d, %Y' # e.g. "Sep 22, 2017"
date_format2 = '%B %d, %Y' # e.g. "June 22, 2017"
for csv_filename in glob.glob('*.csv'):
with open(csv_filename, newline='') as f_input:
currency_name = os.path.splitext(os.path.basename(csv_filename))[0]
csv_input = csv.reader(f_input)
header = next(csv_input)
currencies.add(currency_name)
for row in csv_input:
date, market_cap = req_cols(row)
try:
date = datetime.strptime(date, date_format1)
except ValueError: # Try "June 22, 2017"
date = datetime.strptime(date, date_format2)
all_data[date][currency_name] = market_cap
currencies = sorted(currencies)
with open('output.csv', 'w', newline='') as f_output:
header = ['Date'] + currencies
csv_output = csv.DictWriter(f_output, fieldnames=header)
csv_output.writeheader()
for date, entries in sorted(all_data.items()):
entries['Date'] = date.strftime(date_format1)
csv_output.writerow(entries)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.