简体   繁体   English

循环通过 CSV 文件,执行 function,并连接 DataFrame 对象

[英]Looping through CSV files, performing a function, and concatenating DataFrame objects

I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object, for all.csv files in the folder. I am attempting to loop through multiple CSVs, find the mean of multiple variable columns (6 to be specific) within the CSV, which in turn will output a single row of results (6,1) in dimensions, and append that to a dataframe object,对于文件夹中的所有.csv文件。

I am quite new to programming, and the first code section below does not give the desired output.我对编程很陌生,下面的第一个代码部分没有给出所需的 output。

My attempted code is as such:我尝试的代码是这样的:


from pathlib import Path
import os

appended_data = []

folder= r"C:\Users\A\Desktop\Analysis\Field result\Height"

for file in Path(folder).glob('*.csv'):
    
    #Print filename to see which file is being processed
    #will use filenames later to add a column to appended_data to use as date-time
    print(os.path.basename(file))
    
    #Read csv
    df = pd.read_csv(file)
    
    #Averaging all variables of interest; mean, min, max, median etc.
    averages = df[df.columns[3:9]].mean()
             
    #Transpose to store as row
    Averages_Data = averages_pd.T
    
    # store DataFrame in list
    appended_data.append(Averages_Data)
    
#see pd.concat documentation for more info
appended_data = pd.concat(appended_data)

print(appended_data)

I am able to do it manually for the first two.csvs with the code below:我可以使用以下代码手动为前两个.csvs 执行此操作:

height = pd.read_csv(r"C:\Users\A\Desktop\Analysis\Field result\Height\height1.csv")
height.head()
height.info()
averages = height[height.columns[3:9]].mean()
print(averages)
Averages_Data = averages_pd.T
height2 = pd.read_csv(r"C:\Users\Al\Desktop\Analysis\Field result\Height\height2.csv")
averages2 = height2[height2.columns[3:9]].mean()
print(averages2)
Averages2 = averages2_pd.T
Averages2
Averages_Data
x = pd.concat([Averages_Data, Averages2])

and this gives the results I am looking for, but I can't seem to be able to generalise it, doing it manually will take too long.这给出了我正在寻找的结果,但我似乎无法概括它,手动执行将花费太长时间。 How would I do this?我该怎么做?

Would this work:这会起作用吗:

from pathlib import Path

folder = r"C:\Users\A\Desktop\Analysis\Field result\Height"
dfs = []
for file in Path(folder).glob("*.csv"):
    print(file.name)
    dfs.append(pd.DataFrame(pd.read_csv(file).iloc[:, 3:9].mean()).T)
df = pd.concat(dfs, ignore_index=True)

When I run it, with accordingly adjusted folder , for the following 5 sample files当我使用相应调整的folder运行它时,对于以下 5 个示例文件

from pathlib import Path
from random import randint

folder = "Test"
base = Path(folder)
base.mkdir(exist_ok=True)
cols_str = ",".join(f"col{i}" for i in range(10))
for n in range(1, 6):
    (base / f"file{n}.csv").write_text(
        cols_str + "\n"
        + ",".join(str(randint(1, 100)) for _ in range(10)) + "\n"
        + ",".join(str(randint(1, 100)) for _ in range(10))
    )

which essentially look like基本上看起来像

col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
55,60,77,32,89,49,71,7,60,52
10,40,66,96,45,11,53,36,40,28

I get the following df :我得到以下df

   col3  col4  col5  col6  col7  col8
0  54.0  61.5  99.5  61.5  34.5  34.5
1  26.5  72.0  65.0  93.0  56.0  73.5
2  53.5  79.0  55.0  68.0  32.5  46.0
3  66.5  44.0  38.5  65.5  16.0  36.5
4  21.0  73.0  54.5  48.5  73.0  45.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM