简体   繁体   English

Python:CSV 文件中基于另一列值的平均值

[英]Python: Average values in a CSV file based on value of another column

I am a noob and I have a large CSV file with data structured like this (with a lot more columns):我是一个菜鸟,我有一个大的 CSV 文件,其中的数据结构如下(有更多的列):

State  daydiff
CT     5.5
CT     6.5
CT     6.25
NY     3.2
NY     3.225
PA     7.522
PA     4.25

I want to output a new CSV where the daydiff is averaged for each State like this:我想输出一个新的 CSV,其中每个Statedaydiff是这样平均的:

State  daydiff
CT     6.083
NY     3.2125
PA     5.886

I have tried numerous ways and the cleanest seemed to leverage pandas groupby but when i run the code below:我尝试了很多方法,最干净的方法似乎是利用 pandas groupby但是当我运行下面的代码时:

import pandas as pd

df = pd.read_csv('C:...input.csv')
df.groupby('State')['daydiff'].mean()

df.to_csv('C:...AverageOutput.csv')

I get a file that is identical to the original file but with a counter added in the first column with no header:我得到一个与原始文件相同的文件,但在第一列中添加了一个没有标题的计数器:

,State,daydiff
0,CT,5.5
1,CT,6.5
2,CT,6.25
3,NY,3.2
4,NY,3.225
5,PA,7.522
6,PA,4.25

I was also hoping to control the new average in datediff to a decimal going out only to the hundredths.我还希望将datediff的新平均值控制为仅到百分之一的小数。 Thanks谢谢

The "problem" with the counter is because the default behaviour for to_csv is to write the index.计数器的“问题”是因为to_csv的默认行为是写入索引。 You should do df.to_csv('C:...AverageOutput.csv', index=False) .你应该做df.to_csv('C:...AverageOutput.csv', index=False)

You can control the output format of daydiff by converting it to string.您可以通过将其转换为字符串来控制 daydiff 的输出格式。 df.daydiff = df.daydiff.apply(lambda x: '{:.2f}'.format(x))

Your complete code should be:你的完整代码应该是:

df = pd.read_csv('C:...input.csv')
df2 = df.groupby('State')['daydiff'].mean().apply(lambda x: '{:.2f}'.format(x))
df2.to_csv('C:...AverageOutput.csv')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 查找 CSV 中的列的平均值,给定另一列中的值(来自具有多年的文件中特定年份的数据)? - Python - Finding average of a column in a CSV given a value in another column (data from a specific year in a file with multiple years)? Python - 根据另一个值将值写入 CSV 文件 - Python - writing values to a CSV file based on another value 如何根据python(pandas,jupyter)中的另一列值获取一列的平均值 - how to get the average of values for one column based on another column value in python (pandas, jupyter) 而不是在csv文件中丢失值,而是在该列中写入平均值(在python中) - Instead of missing values in the csv file, write the average of the values in that column(in python) 如何遍历 CSV 文件并根据另一列的值更新一列中的值 - How to iterate over a CSV file and update values in one column based on the value of another column 使用python csv根据csv文件中特定列的不同值打印与另一列中的最小值相关的所有行 - Print all rows related to minimum values from another column based on distinct values of a specific column from csv file using python csv 一列的平均值基于另一列的值 - Average values of one column based on the values of another python csv基于另一列添加列值 - python csv add column values based on another column 如何根据 CSV 文件 python 中另一列中的值将 json 数据写入一列 - How to write json data into a column based on the value in another column in CSV file python 基于另一列平均一个 python dataframe 列 - Average a python dataframe column based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM