简体   繁体   English

导入多个CSV文件,从每个文件中选择1列,然后用Jupyter Noteboos中的文件名重命名该列

[英]Import multiple CSV files, select 1 Column from each file & rename the column with the file name in Jupyter Noteboos

Im trying to import 100 CSV files from this kaggle link - https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data 我试着从这个kaggle链接导入CSV 100个文件- https://www.kaggle.com/natehenderson/nate-s-cryptocurrency-analysis/data

Each file contains the historical information for a different crypto currency. 每个文件都包含不同加密货币的历史信息。

Each file looks like this: 每个文件如下所示:

How data looks like 数据看起来如何

for the current analysis I only need the Market Cap Column of each of the files along with the index which is a time stamp. 对于当前的分析,我只需要每个文件的市值列以及带有时间戳的索引。

So for each file the only column needed is the Market Cap. 因此,对于每个文件,唯一需要的列就是市值。 Then I need to append each column & replace the name of the column with the name of the file. 然后,我需要追加每列并用文件名替换该列的名称。

The final result should look like this : 最终结果应如下所示:

Final Result 最后结果

Where each column contains the Market Cap & of coarse each value should correspond with the index, and the name of the column should be the same as the name of the csv file. 如果每列包含市值和粗略值,则每个值应与索引相对应,并且该列的名称应与csv文件的名称相同。

any ideas how I can do this? 有什么想法我该怎么做?

The following should get you started. 以下内容将帮助您入门。 This assumes you have a folder containing all the CSV files and that each CSV file has the same format, namely something like: 假设您有一个包含所有CSV文件的文件夹,并且每个CSV文件都具有相同的格式,即:

Date,Open,High,Low,Close,Volume,Market Cap
"Sep 22, 2017",1.23,1.25,1.14,1.24,513898,12916700
"Sep 23, 2017",1.28,1.35,1.18,1.23,1700200,13448400

As you are trying to write data horizontally, you will need to gather all the data into memory before it can be written. 在尝试水平写入数据时,需要先将所有数据收集到内存中,然后才能进行写入。 This script reads each file one at a time and assigns each row to a dictionary of dictionaries. 该脚本一次读取每个文件,并将每一行分配给字典的字典。 The first holds the date and the second contains all the currencies that have an entry for that date. 第一个包含date ,第二个包含具有该日期条目的所有货币。 The date is converted into a datetime object to ensure that they can be correctly sorted when writing the output CSV file. date将转换为datetime对象,以确保在写入输出CSV文件时可以对它们进行正确排序。 As each file is read, its name is stored in a set so that a definitive list of currency names is available. 读取每个文件时,其名称存储在一个集中,以便可以使用货币名称的确定列表。

To output, the currency names are sorted and a DictWriter is used to save all the values. 为了输出,货币名称被排序,并且DictWriter用于保存所有值。 This has the benefit of storing empty values for any missing data: 这具有为任何丢失的数据存储空值的好处:

from collections import defaultdict
from operator import itemgetter
from datetime import datetime
import csv
import glob
import os

req_cols = itemgetter(0, 6)
all_data = defaultdict(dict)
currencies = set()
date_format1 = '%b %d, %Y'  # e.g. "Sep 22, 2017"
date_format2 = '%B %d, %Y'  # e.g. "June 22, 2017"

for csv_filename in glob.glob('*.csv'):
    with open(csv_filename, newline='') as f_input:
        currency_name = os.path.splitext(os.path.basename(csv_filename))[0]
        csv_input = csv.reader(f_input)
        header = next(csv_input)
        currencies.add(currency_name)

        for row in csv_input:
            date, market_cap = req_cols(row)

            try:
                date = datetime.strptime(date, date_format1)
            except ValueError:      # Try "June 22, 2017"
                date = datetime.strptime(date, date_format2)

            all_data[date][currency_name] = market_cap

currencies = sorted(currencies)

with open('output.csv', 'w', newline='') as f_output:
    header = ['Date'] + currencies
    csv_output = csv.DictWriter(f_output, fieldnames=header)
    csv_output.writeheader()

    for date, entries in sorted(all_data.items()):
        entries['Date'] = date.strftime(date_format1)
        csv_output.writerow(entries)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 csv 文件中读取单列并用文本文件的名称重命名 - Read single column from csv file and rename with the name of the text file Python 检查每个列分隔符(从 csv 文件导入) - Python checking for each column delimiter (import from csv file) 将多个excel文件导入pandas并根据文件名创建一列 - Import multiple excel files into pandas and create a column based on name of file Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name - Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name 如何将 csv 文件中的列拆分为 python jupyter 中的多列? - How to split a column in csv file into multiple column in python jupyter? 如何从熊猫中的多个数据框创建Csv文件,并以数据框的名称作为每一列的标题? - How to create a Csv file from multiple dataframes in pandas with the name of the dataframe as a header of each column? 将多个 csv 文件按列与 header 作为文件名组合 - Combine multiple csv files column wise with the header as file name Python 从多个 CSV 文件中读取数据并将每个文件添加到新列 - Python Reading data from multiple CSV files and adding each file to a new column 重命名 csv 文件中的列 - Rename the column inside csv file Select 来自多个 csv 文件的特定列,然后使用 pandas 将这些列合并到单个文件中 - Select specific column from multiple csv files, then merge those columns into single file using pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM