简体   繁体   English

如何将多个csv文件中的行复制到pandas中的新文件?

[英]How to copy rows from multiple csv files to new files in pandas?

I have 10 csv files. 我有10个csv文件。 I want to copy first row from all csv files and save as new csv file, then copy second row from all csv files and save as second csv file and etc. My code in the following done only for first row and other rwos display NaN . 我想从所有csv文件中复制第一行并保存为新的csv文件,然后从所有csv文件中复制第二行并保存为第二个csv文件等。以下我的代码仅针对第一行和其他rwos显示NaN Where is my error? 我的错误在哪里?

Code

import pandas as pd
import datetime
import glob

path = r'/Jupyter_Works/new_csv'
all_files = glob.glob(path + "/*.csv")

date_time = datetime.datetime(2018, 1, 1)
index = pd.date_range(start='1/1/2018', periods= 8760, freq='H')

columns = ['Lat','Lon','Alt','Temperature','Relative Humidity','Wind speed','Wind direction','Short-wave irradiation']
dfcsv = pd.DataFrame(index=index, columns=columns)

for filename in all_files:
    df = pd.read_csv(filename, index_col='time', header=0)
    dfcsv.iloc[0] = df.iloc[0]

dfcsv

Result 结果

Lat Lon Alt Temperature Relative Humidity   Wind speed  Wind direction  Short-wave irradiation
2018-01-01 00:00:00 31.03   49.36   99  285.56  52.82   2.95    128.5   0
2018-01-01 01:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 02:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 03:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
2018-01-01 04:00:00 NaN NaN NaN NaN NaN NaN NaN NaN

First create one big DataFrame with list comprehension and concat , loop by unique values for select by loc and write to files by DataFrame.to_csv . 首先创建一个带有列表理解和concat大型DataFrame ,按照loc选择的唯一值循环,并通过DataFrame.to_csv写入文件。 It working, because each DataFrame has unique index, so if select by unique values then select rows with same position in all files. 它工作,因为每个DataFrame都有唯一索引,因此如果按唯一值选择,则选择所有文件中具有相同位置的行。

path = r'/home/nickan/Jupyter_Works/new_csv'
all_files = glob.glob(path + "/*.csv")

dfs = [pd.read_csv(fp, index_col='time', parse_dates=['time']) for fp in all_files]
df = pd.concat(dfs)

for x in df.index.unique():
    #removed duplicated index by index=False
    df.loc[x].to_csv(f'csv/file_{x.strftime("%Y-%m-%d_%H")}.csv', index=False)

EDIT: 编辑:

Because memory problems is possible use alternative solution with loop by each row in dataFrames and write in append mode: 因为内存问题是可能的,所以使用dataFrames中每行循环的替代解决方案并以append模式写入:

for i, fp in enumerate(all_files):
    df = pd.read_csv(fp, index_col='time', parse_dates=['time']) 
    for x in df.index:
        f = f'out/file_{x.strftime("%Y-%m-%d_%H")}.csv'
        if i == 0:
            df.loc[[x]].to_csv(f, index=False)

        else:
            df.loc[[x]].to_csv(f, index=False,header=None, mode='a')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM