[英]Data analysis using Pandas
I have two different DataFrames from a.csv file我有两个来自 a.csv 文件的不同 DataFrame
Columns in file:文件中的列:
Index(['App', 'Category', 'Rating', 'Reviews', 'Size_MBs', 'Installs', 'Type','Price', 'Content_Rating', 'Genres', 'Last_Updated','Android_Ver'],dtype='object')
First One:第一:
category_installs=df_apps_clean.groupby('Category').agg({'Installs':pd.Series.sum})
category_installs.sort_values('Installs', ascending=True, inplace=True)
gives output as:给出 output 作为:
**Category---------------Installs**
VIDEO_PLAYERS-------3916897200
FAMILY-----------------4437554490
PHOTOGRAPHY--------4649143130
SOCIAL-----------------5487841475
PRODUCTIVITY---------5788070180
TOOLS------------------8099724500
COMMUNICATION------11039241530
GAME-------------------13858762717
Second one:第二个:
app_installs = df_apps_clean.groupby('Category').agg({'App':pd.Series.count})
app_installs.sort_values('App', ascending=False)
gives output as:给出 output 作为:
**Category--------------App**
FAMILY----------------1606
GAME-------------------910
TOOLS------------------719
PRODUCTIVITY----------301
PERSONALIZATION------298
LIFESTYLE---------------297
FINANCE----------------296
MEDICAL----------------292
PHOTOGRAPHY---------263
BUSINESS--------------262
SPORTS----------------260
COMMUNICATION------257
but when i'm merging them using pandas merge function like this:但是当我使用 pandas merge function 合并它们时,如下所示:
cat_merged_df = pd.merge(app_installs, category_installs,on='Category', how='inner')
cat_merged_df.sort_values('Installs', ascending=False)
I'm getting output as:我得到 output 作为:
**Category----------App_x----------Installs----------App_y**
GAME----------------910----------13858762717--------Ra Ga BaMu.F.O.Brick Breaker BR211:CK
COMMUNICATION----257---------11039241530---------EJ messengerBest Browser BD social networkingD...
TOOLS----------------719------------8099724500--------ei CalcBM speed testCZ Kompasap,wifi testing,i...
PRODUCTIVITY--------301--------5788070180-----------ER AssistBAMMS for BM SQDL Image ManagerEB Sca...
SOCIAL--------------203------------5487841475---------CB HeroesDN BlogHum Ek Hain 2.02UP EB Bill Pay...
Why am I getting 3 columns with App column got split as App_x and App_y?为什么我得到 3 列 App 列被拆分为 App_x 和 App_y? There is not such data in the file I'm working on.我正在处理的文件中没有此类数据。
If I understand what you want then maybe this will help.如果我明白你想要什么,那么也许这会有所帮助。
make df1制作df1
import pandas as pd
col1 = ['VIDEO_PLAYERS', 'FAMILY', 'PHOTOGRAPHY', 'SOCIAL', 'PRODUCTIVITY', 'TOOLS', 'COMMUNICATION', 'GAME']
col2 = [3916897200, 4437554490, 4649143130, 5487841475, 5788070180, 8099724500, 11039241530, 13858762717]
d = {'Category':col1, 'Installs':col2}
df1 = pd.DataFrame(d)
Category Installs
0 VIDEO_PLAYERS 3916897200
1 FAMILY 4437554490
2 PHOTOGRAPHY 4649143130
3 SOCIAL 5487841475
4 PRODUCTIVITY 5788070180
5 TOOLS 8099724500
6 COMMUNICATION 11039241530
7 GAME 13858762717
make df2制作DF2
col1 = ['FAMILY', 'GAME', 'TOOLS', 'PRODUCTIVITY', 'PERSONALIZATION', 'LIFESTYLE', 'FINANCE', 'MEDICAL', 'PHOTOGRAPHY', 'BUSINESS', 'SPORTS', 'COMMUNICATION']
col2 = [1606, 910, 719, 301, 298, 297, 296, 292, 263, 262, 260, 257]
d = {'Category':col1, 'App':col2}
df2 = pd.DataFrame(d)
Category App
0 FAMILY 1606
1 GAME 910
2 TOOLS 719
3 PRODUCTIVITY 301
4 PERSONALIZATION 298
5 LIFESTYLE 297
6 FINANCE 296
7 MEDICAL 292
8 PHOTOGRAPHY 263
9 BUSINESS 262
10 SPORTS 260
11 COMMUNICATION 257
Merge two frames on Category
合并Category
上的两个框架
pd.merge(left=df1, right=df2, on='Category')
Category Installs App
0 FAMILY 4437554490 1606
1 PHOTOGRAPHY 4649143130 263
2 PRODUCTIVITY 5788070180 301
3 TOOLS 8099724500 719
4 COMMUNICATION 11039241530 257
5 GAME 13858762717 910
If this is not what you want please show how you want the output to look and I will update.如果这不是您想要的,请说明您希望 output 看起来如何,我会更新。 You can how=
to change the join type.您可以how=
更改连接类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.