简体   繁体   English

使用Python代码按日期提取数据并另存为单独的csv Datewise

[英]Extract data Date-wise using Python Code & Save as separate csv Datewise

I have to extract data datewise and save as separate csv for each different dates: "time" is given in this format(2018-03-26T16:09:10.024101278Z) in one column of CSV file. 我必须按日期提取数据并将每个不同的日期另存为单独的csv:以这种格式(2018-03-26T16:09:10.024101278Z)在CSV文件的一列中给出了“时间”。

This Dataset has more than 100k rows taken in a different time. 此数据集在不同的时间有超过10万行。 "I have tried making a data frame" '''Column name: (name time id ddr version readings) for reference''' “我尝试制作数据框”'''列名:(名称时间ID DDR版本的读数)供参考'''

dataset_CT= pd.read_csv("out_1.csv")
dataset_CT['Dates'] = pd.to_datetime(dataset_CT['time']).dt.date
dataset_CT['Time'] = pd.to_datetime(dataset_CT['time']).dt.time
dataset_CT.sort_values(by='Dates', axis=0, inplace=True)
dataset_CT.set_index(keys=['Dates'], drop=False,inplace=True)

Date_list=dataset_CT['Dates'].unique().tolist()

"got Date_list like this([datetime.date(2018, 3, 26), datetime.date(2018, 3, 31)])" “像这样获得Date_list([datetime.date(2018,3,26),datetime.date(2018,3,31)])”

Date_set = dataset_CT.loc[dataset_CT.Dates=='(2018, 3, 26)']

I received empty Dataframe like below 我收到了空的数据框,如下所示

      name  time id ddr version readings Dates  Time
Dates

How working compare by string? 如何通过字符串进行比较?

Date_set = dataset_CT.loc[dataset_CT.Dates=='2018-03-26']

If not working, try to changed Series.dt.date : 如果不起作用,请尝试更改Series.dt.date

dataset_CT['Dates'] = pd.to_datetime(dataset_CT['time']).dt.date
Date_set = dataset_CT.loc[dataset_CT.Dates=='2018-03-26']

to Series.dt.floor for datetimes with no times: 没有时间到Series.dt.floor的日期时间:

dataset_CT['Dates'] = pd.to_datetime(dataset_CT['time']).dt.floor('d')
Date_set = dataset_CT.loc[dataset_CT.Dates=='2018-03-26']

As you read your input with default parameters, I will assume that you have comma ( , ) for separator and one header line. 当您使用默认参数读取输入内容时,我将假定您在分隔符和一个标题行中使用逗号( , )。 IMHO pandas for that is not required here. 此处不需要恕我直言的熊猫。 It is enough to read the file one row at time and write it in a csv file corresponding to the date. 一次读取一行并将其写入对应于日期的csv文件中就足够了。

The caveats: add the header to each output csv file and create a new output file for every new date. 注意事项:将标头添加到每个输出csv文件,并为每个新日期创建一个新的输出文件。 A collections.defaultdict with a custom default function is enough to meet those 2 requirement. 具有自定义默认功能的collections.defaultdict足以满足这两个要求。

The following code reads an input csv file named "out_1.csv" and writes it content in a bunch of files named out_2018-03-26.csv the date being the date of all rows in the output file: 以下代码读取名为"out_1.csv"的输入csv文件,并将其内容写入一堆名为out_2018-03-26.csv的文件中,日期是输出文件中所有行的日期:

with open("out_1.csv") as fdin:
    def get_defaults():
        """returns a pair (csv_writer, file_object) for date dat initialized with header"""
        filename = 'out{}.csv'.format(dat)
        fd = open(filename, "w", newline='')
        fd.write(header)
        return (csv.writer(fd), fd)
    outfiles = collections.defaultdict(get_defaults)
    rd = csv.reader(fdin)
    header = next(fdin)             # store the header to later initialize output files
    for row in rd:
        dat = row[1][:10]           # extract the date
        wr = outfiles[dat][0]
        wr.writerow(row)            # and write the row to the appropriate output file
    # close the output files
    for i in outfiles:
        outfile[i][1].close()

After a second thinking about it, above code could keep too many open files. 再三考虑之后,以上代码可能会保留太多打开的文件。 Here is an improved version that only keep open files for the 3 most recently encountered dates (tested): 这是一个改进的版本,仅在最近的3个日期(经过测试)中保留打开的文件:

with open("out_1.csv") as fdin:
    cache = collections.deque()
    seen = set()
    def get_defaults():
        """returns a pair (csv_writer, file_object) for date dat initialized with header"""
        filename = 'out{}.csv'.format(dat)
        fd = open(filename, 'a' if dat in seen else 'w', newline='')
        if 0 == fd.tell():          # file is currently empty: write header
            fd.write(header)
        ret = (csv.writer(fd), fd)
        cache.append(dat)
        seen.add(dat)
        if len(cache) > 3:          # only keep 3 open files
            old = cache.popleft()
            print("Closing", old)
            outfiles[old][1].close()
            del outfiles[old]
        return ret

    outfiles = collections.defaultdict(get_defaults)
    rd = csv.reader(fdin)
    header = next(fdin)   # store the header to later initialize output files
    for row in rd:
        dat = row[1][:10] # extract the date
        wr = outfiles[dat][0]
        wr.writerow(row)  # and write the row to the appropriate output file
    # close the currently opened output files
    for i in outfiles:
        outfiles[i][1].close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Pandas 中的按日期 CSV 数据中查找按年平均值用于绘制条形图 - How to Find Year-wise Mean from Date-wise CSV Data In Pandas For Plotting bar chart 使用来自.txt文件的python按日期和时间排序行 - sorting lines date-wise and time-wise using python from a .txt file 使用python写入CSV-Datewise列 - Writing into CSV using python - Datewise columns 在按日期的嵌套列表中按时间对元素进行分组? - Grouping elements time-wise in a date-wise nested list? 如何创建按日期排列的芹菜日志文件? - How to create date-wise celery log file? 按字节读取二进制数据并使用python提取数据 - reading binary data byte wise and extract data using python 如何使用 Python 从大型 .csv 文件中迭代地提取和保存 .csv 数据块? - How to extract and save in .csv chunks of data from a large .csv file iteratively using Python? 使用 Python/Pandas 提取与每个城市相关的数据,并使用循环或 function 保存在单独的 excel 表中 - Using Python/Pandas to extract data associated with each of the cities and save in separate excel sheet using loop or function 如何使用 python 从 .csv 文件中的行中提取数据到单独的 .txt 文件中? - How to extract data from rows in .csv file into separate .txt files using python? 在MapReduce Python中明智地计算文件和日期 - count file wise as well as datewise in Map reduce Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM