简体   繁体   中英

Read and append each nth row from csv files in a folder python

I have a folder containing 30 files, each of them containing thousands of rows. I would like to loop through the files, creating a dataframe containing each 10th row from each file. The resulting dataframe would contain rows 10, 20, 30, 40, etc. from the first file; rows 10, 20, 30, 40, etc. from the second file and so on.

For the moment I have:

all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

that appends in a list the different files from the folder. But I don't know how to go further.

Any idea? thank you in advance.

This will slice the df with every 10th row using iloc and then append it to the final-df . At the end of the loop, the final_df should contain all the necessary rows

all_files = glob.glob("DK_Frequency/*.csv")
li = []
final_df = pd.DataFrame()
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    final_df.append(df.iloc[::10])

Assuming that all the csv files have the same structure, you could do as follows:

# -*- coding: utf-8 -*-
all_files = glob.glob("DK_Frequency/*.csv")

#cols_to_take is the list of column headers
cols_to_take = pd.read_csv(all_files[0]).columns

## create an empty dataframe
big_df = pd.DataFrame(col_to_take)

for csv in all_files:
    df = pd.read_csv(csv)
    indices = list(filter(lambda x: x % 10 == 0, df.index))
    df = df.loc[indices].reset_index()

    ## append df to big_df
    big_df = big_df.append(df, ignore_index=True)

Pandas read_csv allows to keep only every 10th line with skiprows . So you could use:

all_files = glob.glob("DK_Frequency/*.csv")
li = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0, skiprows = lambda x: 0 != x%10)
    li.append(df)
global_df = pd.concat(li, ignore_index=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM