[英]Looping through CSV files, performing a function, and concatenating DataFrame objects
I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object, for all.csv files in the folder. I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object,对于文件夹中的所有.csv文件。
I am quite new to programming, and the first code section below does not give the desired output.我对编程很陌生,下面的第一个代码部分没有给出所需的 output。
My attempted code is as such:我尝试的代码是这样的:
from pathlib import Path
import os
appended_data = []
folder= r"C:\Users\A\Desktop\Analysis\Field result\Height"
for file in Path(folder).glob('*.csv'):
#Print filename to see which file is being processed
#will use filenames later to add a column to appended_data to use as date-time
print(os.path.basename(file))
#Read csv
df = pd.read_csv(file)
#Averaging all variables of interest; mean, min, max, median etc.
averages = df[df.columns[3:9]].mean()
#Transpose to store as row
Averages_Data = averages_pd.T
# store DataFrame in list
appended_data.append(Averages_Data)
#see pd.concat documentation for more info
appended_data = pd.concat(appended_data)
print(appended_data)
I am able to do it manually for the first two.csvs with the code below:我可以使用以下代码手动为前两个.csvs 执行此操作:
height = pd.read_csv(r"C:\Users\A\Desktop\Analysis\Field result\Height\height1.csv")
height.head()
height.info()
averages = height[height.columns[3:9]].mean()
print(averages)
Averages_Data = averages_pd.T
height2 = pd.read_csv(r"C:\Users\Al\Desktop\Analysis\Field result\Height\height2.csv")
averages2 = height2[height2.columns[3:9]].mean()
print(averages2)
Averages2 = averages2_pd.T
Averages2
Averages_Data
x = pd.concat([Averages_Data, Averages2])
and this gives the results I am looking for, but I can't seem to be able to generalise it, doing it manually will take too long.这给出了我正在寻找的结果,但我似乎无法概括它,手动执行将花费太长时间。 How would I do this?
我该怎么做?
Would this work:这会起作用吗:
from pathlib import Path
folder = r"C:\Users\A\Desktop\Analysis\Field result\Height"
dfs = []
for file in Path(folder).glob("*.csv"):
print(file.name)
dfs.append(pd.DataFrame(pd.read_csv(file).iloc[:, 3:9].mean()).T)
df = pd.concat(dfs, ignore_index=True)
When I run it, with accordingly adjusted folder
, for the following 5 sample files当我使用相应调整的
folder
运行它时,对于以下 5 个示例文件
from pathlib import Path
from random import randint
folder = "Test"
base = Path(folder)
base.mkdir(exist_ok=True)
cols_str = ",".join(f"col{i}" for i in range(10))
for n in range(1, 6):
(base / f"file{n}.csv").write_text(
cols_str + "\n"
+ ",".join(str(randint(1, 100)) for _ in range(10)) + "\n"
+ ",".join(str(randint(1, 100)) for _ in range(10))
)
which essentially look like基本上看起来像
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
55,60,77,32,89,49,71,7,60,52
10,40,66,96,45,11,53,36,40,28
I get the following df
:我得到以下
df
:
col3 col4 col5 col6 col7 col8
0 54.0 61.5 99.5 61.5 34.5 34.5
1 26.5 72.0 65.0 93.0 56.0 73.5
2 53.5 79.0 55.0 68.0 32.5 46.0
3 66.5 44.0 38.5 65.5 16.0 36.5
4 21.0 73.0 54.5 48.5 73.0 45.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.