简体   繁体   中英

Pandas DataFrame: Grouping Rows?

Two challenges here to what I'm trying to accomplish.

  1. A DataFrame where the same company is listed for 2 consecutive rows. The first row associated with each company is related to Apple (iOS) and the second is for Android.
  • I need to have the 'App Views' column represented as an int and then the other columns would be a % of the views. (so if there are 5000 app views the next column for Apple would be installs and I want to show the % of users who viewed the app, then installed it). For this I'll need several columns beyond instal but to keep it short I am just leaving it like this

That's the first part of the challenge. For the 2nd part of the challenge:

  1. I really need to be able to make a big DataFrame full of fake data. Maybe Faker? The way the fake data needs to be populated would be with random values. So for each company I need a random number for Apple Views and then a 0 for Android, and in the next row a random number for Android views and a 0 for Apple. Then I'll need to take a % of those views and have randomized %'s for the next column.

The table is the result I am looking for:

( If this seems like a terrible idea to do in python and would be easier to do in excel somehow that's a great answer too just need someone can point me in the right direction if that is the case then I could then import a.CSV into a DataFrame )

   Company Name     Apple App Views  Apple Install   Droid View  DoidInstall
0    Zynga               5000             0.50          0.00         0.00
1    Zynga               0                0             15000        0.33
2    EA Mobile           22000            0.57          0.00         0.00
3    EA Mobile           0                0             26000        0.49

 

              
import numpy as np
import pandas as pd

# create array with selected values
app_views = [4000, 2222, 9999]
app_install = [0, 0.3, 0.83]

# generate a numpy array with 3 random integeres between 1000 to 10,000
random_app_views = np.random.randint(1000, 10000, size=3)

# generate a numpy array with 3 random numbers between 0 to 1
random_app_install = np.random.uniform(0, 1, size=3)

df = pd.DataFrame({
     'app_views': app_views,
     'app_install_rate': app_install,
     'random_app_views': random_app_views,
     'random_app_install': random_app_install
})

will produce a DataFrame like:

app_views app_install random_app_views random_app_install
0 4000 0.00 2196 0.626350
1 2222 0.30 6917 0.412264
2 9999 0.83 3291 0.303517

hope this would suffice, good luck

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM