简体   繁体   English

Python多文件csv sum列,一周的平均值和分支的平均值

[英]Python multiple file csv sum column, average for the week and average for the branch

I'm new in Python and on directory i have two csv file 我是Python的新手,在目录中我有两个csv文件

file1.csv file1.csv

Id place,Date and hour, Value
1,2018.09.17.12.54,200000
2,2018.09.18.14.16,150000
1,2018.09.19.15.06,78000
3,2018.09.17.16.26,110000
2,2018.09.20.13.54,200000
3,2018.09.17.14.16,150000
1,2018.09.21.12.54,200000

file2.csv file2.csv

Id place,Date and hour, Value
1,2018.09.24.12.54,200000
3,2018.09.24.14.16,150000
1,2018.09.24.15.06,78000
2,2018.09.26.16.26,110000
1,2018.09.27.12.54,200000
3,2018.09.25.14.16,150000
1,2018.09.28.12.54,200000
3,2018.09.28.14.16,150000

I have read all files csv in direcory and save in new csv file information about 我已经在direcory中读取了所有文件csv并保存了新的csv文件信息

  • sum column Value from files sum列文件的值

output 产量

Id place, Value
1, 1 156 000
2, 460 000
3, 710 000
  • average sum in weeks 几周平均总和

output 产量

Week, average Value
1 ,  155428,57   (1088000 / 7)
2 ,  154750   (1238000 / 8)
  • average sum in weeks in place 几周内的平均金额

output 产量

Id place,Week, average Value
1, 1 , 159 333  (478000 / 3)
2, 1 , 175 000  (350000 / 2)
3, 1 , 130 000  (260 000/ 2)
1, 2 , 169 500  (678000 / 4) 
2, 2 , 110 000  (110000 / 1)
3, 2 , 150 000  (450000 / 3)

I have no idea how to do it thanks in advance 我不知道如何提前做到这一点

I suggest use pandas : 我建议使用pandas

import glob
import pandas as pd

#get all files
files = glob.glob('files/*.csv')
#create list of DataFrames, if necessary remove traling whitespaces in csv headers
dfs = [pd.read_csv(fp).rename(columns=lambda x: x.strip()) for fp in files]
#join together all files
df = pd.concat(dfs, ignore_index=True)

#convert column to datetimes
df['Date and hour'] = pd.to_datetime(df['Date and hour'], format='%Y.%m.%d.%H.%M')
#convert to weeks and for starting with 1 add factorize
df['week'] = pd.factorize(df['Date and hour'].dt.weekofyear)[0] + 1
print (df)
    Id place       Date and hour   Value  week
0          1 2018-09-17 12:54:00  200000     1
1          2 2018-09-18 14:16:00  150000     1
2          1 2018-09-19 15:06:00   78000     1
3          3 2018-09-17 16:26:00  110000     1
4          2 2018-09-20 13:54:00  200000     1
5          3 2018-09-17 14:16:00  150000     1
6          1 2018-09-21 12:54:00  200000     1
7          1 2018-09-24 12:54:00  200000     2
8          3 2018-09-24 14:16:00  150000     2
9          1 2018-09-24 15:06:00   78000     2
10         2 2018-09-26 16:26:00  110000     2
11         1 2018-09-27 12:54:00  200000     2
12         3 2018-09-25 14:16:00  150000     2
13         1 2018-09-28 12:54:00  200000     2
14         3 2018-09-28 14:16:00  150000     2

#aggregate sum
df1 = df.groupby('Id place', as_index=False)['Value'].sum()
print (df1)
   Id place    Value
0         1  1156000
1         2   460000
2         3   710000

#aggregate mean
df2 = df.groupby('week', as_index=False)['Value'].mean()
print (df2)
   week          Value
0     1  155428.571429
1     2  154750.000000

#aggregate mean per 2 columns
df3 = df.groupby(['Id place','week'], as_index=False)['Value'].mean()
print (df3)

   Id place  week          Value
0         1     1  159333.333333
1         1     2  169500.000000
2         2     1  175000.000000
3         2     2  110000.000000
4         3     1  130000.000000
5         3     2  150000.000000

#write output DataFrames to files
df1.to_csv('out1.csv', index=False)
df2.to_csv('out2.csv', index=False)
df3.to_csv('out3.csv', index=False)

Definetly not recommended, and pandas is by far the better approach, but the manual way of doing this would be to use defaultdicts to group your items and perform calculations with them at the end. 绝对不推荐, pandas是迄今为止更好的方法,但手动的方法是使用默认分区对项目进行分组并在最后执行计算。

Demo: 演示:

from csv import reader
from os import listdir
from collections import defaultdict
from datetime import datetime
from operator import itemgetter
from pprint import pprint

# Collect sums first in a defaultdict
sums = defaultdict(list)

# Collect dates seperately since they are more complicated
dates = []

# Get all csv files and open them
for file in listdir("."):
    if file.endswith(".csv"):
        with open(file) as f:
            csv_reader = reader(f)

            # Skip headers
            next(csv_reader)

            # Separately get sums and dates stuff
            for place, date, value in csv_reader:
                sums[int(place)].append(int(value))
                dates.append(
                    (place, datetime.strptime(date, "%Y.%m.%d.%H.%M"), int(value))
                )

# Print out sum of columns
sum_column_values = {k: sum(v) for k, v in sums.items()}
pprint(sum_column_values)

# Get Minimum date to get weeknumber
min_date = min(map(itemgetter(1), dates)).date().isocalendar()[1]

# Collect weeks stuff in separate dicts
weeks = defaultdict(list)
place_weeks = defaultdict(list)

for place, date, value in dates:

    # Weeknumber calculation
    week_number = date.date().isocalendar()[1] - min_date + 1

    # Collect week stuff
    weeks[week_number].append(value)
    place_weeks[int(place), week_number].append(value)

# Print out week averages
week_averages = {k: sum(v) / len(v) for k, v in weeks.items()}
pprint(week_averages)

# Print out place/week averages
place_week_averages = {k: sum(v) / len(v) for k, v in place_weeks.items()}
pprint(place_week_averages)

Which give the following results stored in separate dictionaries: 将以下结果存储在单独的词典中:

# place averages
{1: 1156000, 2: 460000, 3: 710000}

# week averages
{1: 155428.57142857142, 2: 154750.0}

# place/week averages
{(1, 1): 159333.33333333334,
 (1, 2): 169500.0,
 (2, 1): 175000.0,
 (2, 2): 110000.0,
 (3, 1): 130000.0,
 (3, 2): 150000.0}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 条件Sum / Average / etc ... Python中的CSV文件 - Conditional Sum/Average/etc… CSV file in Python Python - 计算 csv 文件中每一列的平均值 - Python - Calculate average for every column in a csv file Python - 查找 CSV 中的列的平均值,给定另一列中的值(来自具有多年的文件中特定年份的数据)? - Python - Finding average of a column in a CSV given a value in another column (data from a specific year in a file with multiple years)? 使用Python对多个csv文件中的每一列取平均值 - Take average of each column in multiple csv files using Python 如何使用Python中的多列计算CSV文件中日期之间的平均时间? - How to Calculate Average Time between Dates in CSV File with multiple column in Python? Python:CSV 文件中基于另一列值的平均值 - Python: Average values in a CSV file based on value of another column 使用简单的代码获取csv文件中整个列的平均值(在Python中) - Using simple code to get the average (in Python) of an entire column in a csv file 如何在python中找到csv文件的平均列? - How do I find the average of a column of a csv file in python? 使用 Python 从 CSV 文件中查找每一列的平均值? - Finding average of every column from CSV file using Python? 而不是在csv文件中丢失值,而是在该列中写入平均值(在python中) - Instead of missing values in the csv file, write the average of the values in that column(in python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM