I have a directory of multiple (500+).csv files, each with multiple columns. For each.csv file, I want to calculate the number of observations, as well as both the sum and mean for one specific column. If I was doing this for one.csv file, I would import the.csv as a dataframe and use something really like:
len(df)
df["column_name"].sum()
df["column_name"].mean()
However, I want to perform these three calculations on each individual.csv file and save them all out to a new dataframe, along with the name of the original.csv file. For example, for 3 files called 'file1.csv', 'file2.csv' and 'file3.csv', I want to create a new dataframe that looks like:
Column No. Observations Sum Mean
file1 10 120 12
file2 20 100 5
file3 5 50 10
I've created a new dataframe:
new = pd.DataFrame(columns=['Column', 'No. Observations', 'Sum', 'Mean'])
I've used os.walk to list all the files in the directory:
import os
path = "/Users/user/directory_name"
for root, directories, files in os.walk(path, topdown=False):
for name in files:
print(os.path.join(root, name))
So, at the moment I can generate the list of files in the directory, I can perform the necessary calculations on each file and I can create a new dataframe that contains the columns I want.
What I can't do is iterate over each individual file in the directory, perform the calculations, and save the results to the new dataframe. How do I put it all together?
For each file create tuple, append to list out
and last pass to DataFrame
constructor:
import os
path = "/Users/user/directory_name"
out = []
for root, directories, files in os.walk(path, topdown=False):
for name in files:
file = os.path.join(root, name)
df = pd.read_csv(file)
size = len(df)
s = df["column_name"].sum()
m = df["column_name"].mean()
out.append((name, size, s, m))
new = pd.DataFrame(out, columns=['Column', 'No. Observations', 'Sum', 'Mean'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.