[英]Python: Average values in a CSV file based on value of another column
I am a noob and I have a large CSV file with data structured like this (with a lot more columns):我是一个菜鸟,我有一个大的 CSV 文件,其中的数据结构如下(有更多的列):
State daydiff
CT 5.5
CT 6.5
CT 6.25
NY 3.2
NY 3.225
PA 7.522
PA 4.25
I want to output a new CSV where the daydiff
is averaged for each State
like this:我想输出一个新的 CSV,其中每个
State
的daydiff
是这样平均的:
State daydiff
CT 6.083
NY 3.2125
PA 5.886
I have tried numerous ways and the cleanest seemed to leverage pandas groupby
but when i run the code below:我尝试了很多方法,最干净的方法似乎是利用 pandas
groupby
但是当我运行下面的代码时:
import pandas as pd
df = pd.read_csv('C:...input.csv')
df.groupby('State')['daydiff'].mean()
df.to_csv('C:...AverageOutput.csv')
I get a file that is identical to the original file but with a counter added in the first column with no header:我得到一个与原始文件相同的文件,但在第一列中添加了一个没有标题的计数器:
,State,daydiff
0,CT,5.5
1,CT,6.5
2,CT,6.25
3,NY,3.2
4,NY,3.225
5,PA,7.522
6,PA,4.25
I was also hoping to control the new average in datediff
to a decimal going out only to the hundredths.我还希望将
datediff
的新平均值控制为仅到百分之一的小数。 Thanks谢谢
The "problem" with the counter is because the default behaviour for to_csv
is to write the index.计数器的“问题”是因为
to_csv
的默认行为是写入索引。 You should do df.to_csv('C:...AverageOutput.csv', index=False)
.你应该做
df.to_csv('C:...AverageOutput.csv', index=False)
。
You can control the output format of daydiff by converting it to string.您可以通过将其转换为字符串来控制 daydiff 的输出格式。
df.daydiff = df.daydiff.apply(lambda x: '{:.2f}'.format(x))
Your complete code should be:你的完整代码应该是:
df = pd.read_csv('C:...input.csv')
df2 = df.groupby('State')['daydiff'].mean().apply(lambda x: '{:.2f}'.format(x))
df2.to_csv('C:...AverageOutput.csv')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.