Python：CSV 文件中基于另一列值的平均值

Question

I am a noob and I have a large CSV file with data structured like this (with a lot more columns):我是一个菜鸟，我有一个大的 CSV 文件，其中的数据结构如下（有更多的列）：

State  daydiff
CT     5.5
CT     6.5
CT     6.25
NY     3.2
NY     3.225
PA     7.522
PA     4.25

I want to output a new CSV where the daydiff is averaged for each State like this:我想输出一个新的 CSV，其中每个State的daydiff是这样平均的：

State  daydiff
CT     6.083
NY     3.2125
PA     5.886

I have tried numerous ways and the cleanest seemed to leverage pandas groupby but when i run the code below:我尝试了很多方法，最干净的方法似乎是利用 pandas groupby但是当我运行下面的代码时：

import pandas as pd

df = pd.read_csv('C:...input.csv')
df.groupby('State')['daydiff'].mean()

df.to_csv('C:...AverageOutput.csv')

I get a file that is identical to the original file but with a counter added in the first column with no header:我得到一个与原始文件相同的文件，但在第一列中添加了一个没有标题的计数器：

,State,daydiff
0,CT,5.5
1,CT,6.5
2,CT,6.25
3,NY,3.2
4,NY,3.225
5,PA,7.522
6,PA,4.25

I was also hoping to control the new average in datediff to a decimal going out only to the hundredths.我还希望将datediff的新平均值控制为仅到百分之一的小数。 Thanks谢谢

Answer 1

The "problem" with the counter is because the default behaviour for to_csv is to write the index.计数器的“问题”是因为to_csv的默认行为是写入索引。 You should do df.to_csv('C:...AverageOutput.csv', index=False) .你应该做df.to_csv('C:...AverageOutput.csv', index=False) 。

You can control the output format of daydiff by converting it to string.您可以通过将其转换为字符串来控制 daydiff 的输出格式。 df.daydiff = df.daydiff.apply(lambda x: '{:.2f}'.format(x))

Your complete code should be:你的完整代码应该是：

df = pd.read_csv('C:...input.csv')
df2 = df.groupby('State')['daydiff'].mean().apply(lambda x: '{:.2f}'.format(x))
df2.to_csv('C:...AverageOutput.csv')

Python：CSV 文件中基于另一列值的平均值

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-10-10 14:31:50

Python：CSV 文件中基于另一列值的平均值

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-10-10 14:31:50

解决方案1
1 已采纳 2017-10-10 14:31:50