[英]Pandas groupby rows with csv
I have a large CSV file that I am pulling two columns from (Month and Cancelled) and needing to display the results in a dataframe. 我有一个大的CSV文件,我从(月和已取消)中拉出两列,需要在数据框中显示结果。 The months are integer (eg. January is 1 in the csv) and need to convert it to a string. 月份是整数(例如,csv中的1月是1)并且需要将其转换为字符串。
What I'm having trouble with is setting the correct indices and grouping the data from the months together. 我遇到的问题是设置正确的索引并将月份中的数据分组在一起。
import pandas as pd
data = pd.read_csv('data.csv', encoding='latin-1', usecols=['Month','Cancelled'])
grouped = data.groupby(axis=1)
The expected output I'm looking for is along the lines of: 我正在寻找的预期输出是:
Cancelled
January 19891
But I am currently receiving: 但我目前正在接收:
Month Cancelled
0 1 0
1 1 0
2 1 0
Since you didn't post a row input data. 由于您没有发布行输入数据。 Let's consider this quick example just to show how to make groupby values in pandas; 让我们考虑这个简单的例子来展示如何在pandas中创建groupby值;
After reading your data and puting in a dataframe, you can groupby values based on one of the columns groupby(['month'])
, and then apply a function on these values,Pandas includes a number of common ones such as mean(), max(), median(), etc.: you can use sum()
for example. 在读取数据并放入数据帧后,您可以根据groupby(['month'])
一个列对值进行groupby(['month'])
,然后对这些值应用函数,Pandas包含许多常见的值,例如mean() ,max(),median()等:例如,你可以使用sum()
。
df.groupby(['month']).sum()
Or pass any other function using aggregate 或使用聚合传递任何其他函数
df.groupby(['month']).aggregate(numpy.sum)
import pandas as pd
from io import StringIO
data="""month cancelled
0 1 1
1 1 0
2 0 1
3 1 1
4 0 0
5 1 1
6 1 1
7 2 1
8 2 1
9 1 1"""
df= pd.read_csv(StringIO(data.decode('UTF-8')),delim_whitespace=True )
print df.groupby(['month']).sum()
RESULT 结果
cancelled
month
0 1
1 5
2 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.