使用Pandas进行数据分析

Question

I have two different DataFrames from a.csv file我有两个来自 a.csv 文件的不同 DataFrame

Columns in file:文件中的列：

Index(['App', 'Category', 'Rating', 'Reviews', 'Size_MBs', 'Installs', 'Type','Price', 'Content_Rating', 'Genres', 'Last_Updated','Android_Ver'],dtype='object')

First One:第一：

category_installs=df_apps_clean.groupby('Category').agg({'Installs':pd.Series.sum})
category_installs.sort_values('Installs', ascending=True, inplace=True)

gives output as:给出 output 作为：

 **Category---------------Installs**

 VIDEO_PLAYERS-------3916897200

 FAMILY-----------------4437554490

 PHOTOGRAPHY--------4649143130

 SOCIAL-----------------5487841475

 PRODUCTIVITY---------5788070180

 TOOLS------------------8099724500

 COMMUNICATION------11039241530

 GAME-------------------13858762717

Second one:第二个：

app_installs = df_apps_clean.groupby('Category').agg({'App':pd.Series.count})
app_installs.sort_values('App', ascending=False)

gives output as:给出 output 作为：

**Category--------------App**
            
FAMILY----------------1606

GAME-------------------910

TOOLS------------------719

PRODUCTIVITY----------301

PERSONALIZATION------298

LIFESTYLE---------------297

FINANCE----------------296

MEDICAL----------------292

PHOTOGRAPHY---------263

BUSINESS--------------262

SPORTS----------------260

COMMUNICATION------257

but when i'm merging them using pandas merge function like this:但是当我使用 pandas merge function 合并它们时，如下所示：

cat_merged_df = pd.merge(app_installs, category_installs,on='Category', how='inner')

cat_merged_df.sort_values('Installs', ascending=False)

I'm getting output as:我得到 output 作为：

**Category----------App_x----------Installs----------App_y**


GAME----------------910----------13858762717--------Ra Ga BaMu.F.O.Brick Breaker BR211:CK 

COMMUNICATION----257---------11039241530---------EJ messengerBest Browser BD social networkingD...

TOOLS----------------719------------8099724500--------ei CalcBM speed testCZ Kompasap,wifi testing,i...

PRODUCTIVITY--------301--------5788070180-----------ER AssistBAMMS for BM SQDL Image ManagerEB Sca...

SOCIAL--------------203------------5487841475---------CB HeroesDN BlogHum Ek Hain 2.02UP EB Bill Pay...

Why am I getting 3 columns with App column got split as App_x and App_y?为什么我得到 3 列 App 列被拆分为 App_x 和 App_y？ There is not such data in the file I'm working on.我正在处理的文件中没有此类数据。

Answer 1

If I understand what you want then maybe this will help.如果我明白你想要什么，那么也许这会有所帮助。

make df1制作df1

import pandas as pd

col1 = ['VIDEO_PLAYERS', 'FAMILY', 'PHOTOGRAPHY', 'SOCIAL', 'PRODUCTIVITY', 'TOOLS', 'COMMUNICATION', 'GAME']
col2 = [3916897200, 4437554490, 4649143130, 5487841475, 5788070180, 8099724500, 11039241530, 13858762717]
d = {'Category':col1, 'Installs':col2}

df1 = pd.DataFrame(d)


Category    Installs
0   VIDEO_PLAYERS   3916897200
1   FAMILY  4437554490
2   PHOTOGRAPHY 4649143130
3   SOCIAL  5487841475
4   PRODUCTIVITY    5788070180
5   TOOLS   8099724500
6   COMMUNICATION   11039241530
7   GAME    13858762717

make df2制作DF2

col1 = ['FAMILY', 'GAME', 'TOOLS', 'PRODUCTIVITY', 'PERSONALIZATION', 'LIFESTYLE', 'FINANCE', 'MEDICAL', 'PHOTOGRAPHY', 'BUSINESS', 'SPORTS', 'COMMUNICATION']
col2 = [1606, 910, 719, 301, 298, 297, 296, 292, 263, 262, 260, 257]
d = {'Category':col1, 'App':col2}

df2 = pd.DataFrame(d)

    Category    App
0   FAMILY  1606
1   GAME    910
2   TOOLS   719
3   PRODUCTIVITY    301
4   PERSONALIZATION 298
5   LIFESTYLE   297
6   FINANCE 296
7   MEDICAL 292
8   PHOTOGRAPHY 263
9   BUSINESS    262
10  SPORTS  260
11  COMMUNICATION   257

Merge two frames on Category合并Category上的两个框架

pd.merge(left=df1, right=df2, on='Category')


    Category        Installs    App
0   FAMILY          4437554490  1606
1   PHOTOGRAPHY     4649143130  263
2   PRODUCTIVITY    5788070180  301
3   TOOLS           8099724500  719
4   COMMUNICATION   11039241530 257
5   GAME            13858762717 910

If this is not what you want please show how you want the output to look and I will update.如果这不是您想要的，请说明您希望 output 看起来如何，我会更新。 You can how= to change the join type.您可以how=更改连接类型。

使用Pandas进行数据分析

问题描述

1 个解决方案

解决方案1
0 2022-02-22 12:22:47

使用Pandas进行数据分析

问题描述

1 个解决方案

解决方案1 0 2022-02-22 12:22:47

解决方案1
0 2022-02-22 12:22:47