[英]Groupby mean in pandas python
I have a csv file consisting of 5 fields. 我有一个由5个字段组成的csv文件。 Some sample data:
一些样本数据:
market_name,vendor_name,price,name,ship_from
'Greece',03wel,1.79367196,huhif,Germany
'Greece',le,0.05880975,fdfd,Germany
'Mlkio',dpg,0.11344859,fdfd,Germany
'Greece',gert,0.18655316,,Germany
'Tu',roland,0.52856728,fdfsdv,Germany
'ghuo',andy,0.52856728,jhjhj,Germany
'ghuo',didier,0.02085452,fsdfdf,Germany
'arsen',roch,0.02578377,uykujkj,Germany
'arsen',dpg,0.10010169,wrefrewrf,Germany
'arsen',dpg,0.06415609,jhgjhg,Germany
'arsen',03wel,0.02578377,gfdgb,Germany
'giar',03wel,0.02275039,gfhfbf,Germany
'giar',03wel,0.42751765,sdgfdgfg,Germany
In this file there are multiple records for every vendor. 在此文件中,每个供应商都有多个记录。 I want to find every unique value of the field
vendor_name
and also calculate the average price
for each vendor. 我想查找
vendor_name
字段的每个唯一值,并计算每个供应商的平均price
。 I am using the following script: 我正在使用以下脚本:
import pandas as pd
import numpy as np
import csv
from random import randint
ds = pd.read_csv("sxedonetoimo2.csv",
dtype={"vendor_name": object, "name" : object,
"ship_from" : object, "price": object})
ds['ship_from']=ds.ship_from.str.lower()
print(ds.dtypes)
pd.to_numeric(ds['price'], errors='coerce')
d = { 'name': pd.Series.nunique,
'ship_from' : lambda x: randint(1,2) if (x==('eu'or'europe'or'eu'or'europeanunion'or'worldwide'or'us'or'unitedstates'or'usa'or'us'or'ww'or'wweu'or'euww'or'internet')).any() else randint(3,20)
,'price': ds.groupby('vendor_name')['price'].mean()
}
result = ds.groupby('vendor_name').agg(d)
result.to_csv("scaled_upd.csv")
But I am getting this error : 但我收到此错误:
raise DataError('No numeric types to aggregate') pandas.core.base.DataError: No numeric types to aggregate
引发DataError('没有要聚合的数字类型')pandas.core.base.DataError:没有要聚合的数字类型
In my csv file, the values of the field price is only numbers. 在我的csv文件中,字段价格的值只是数字。 If I change the type of that field to
float
, it raises an error that a specific string cannot be parsed. 如果我将该字段的类型更改为
float
,则会引发无法解析特定字符串的错误。 I am really confused. 我真的很困惑。 Any help?
有什么帮助吗?
Just use read_csv()
, groupby()
and agg()
: 只需使用
read_csv()
, groupby()
和agg()
:
import pandas as pd
df = pd.read_csv('test.csv').groupby('vendor_name') /
.agg({'price': 'mean', 'name': lambda x: x.nunique()})
Yields: 产量:
price name
vendor_name
03wel 0.567431 4
andy 0.528567 1
didier 0.020855 1
dpg 0.092569 3
gert 0.186553 0
le 0.058810 1
roch 0.025784 1
roland 0.528567 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.