[英]Jupyter Notebook - Python Code
我正在做一个Jupyter Notebook分析一些看起来像这样的数据:
我必须找出以下信息:
这是我尝试过的方法,但是它不起作用,我对如何执行b部分完全感到困惑。
# Import relevant packages/modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
# Import relevant csv data file
data = pd.read_csv("C:/Users/Hanna/Desktop/Sheridan College/Statistics for Data Science/Assignment 1/MATH37198_Assignment1_Individual/IGN_game_ratings.csv")
# Part a: Determine the z-score of "Super Mario Kart" and print out result
superMarioKart_zscore = data[data['Game']=='Super Mario Kart'] ['Score'].stats.zscore()
print("Z-score of Super Mario Kart: ", superMarioKart_zscore)
# Part b: The top 20 (most common) platforms
# Part c: The average score of all the Shooter games
averageShooterScore = data[data['Group']=='Game']['Score'].mean()
# Print output
print("The average score of all the Shooter games is: ", averageShooterScore)
# Part d: The top two platforms witht the most perfect scores (10)
# Part e: The probability of a game randomly selected that is an RPG
# First find the number of games in the list that is an RPG
numOfRPGGames = 0
for game in data['Game']:
if data['Genre'] == 'RPG':
numOfRPGGames += 1
# Divide this by the total number of games to find the probablility of selecting one
print("The probability of selecting a game that is an RPG is: ", numOFRPGGames/totalNumGames)
# Part f: The probability of a game randomly selected with a score less than 5
# First find the number of games in the list with a score less than 5 using a for loop:
numScoresLessThan5 = 0
for game in data['Game']:
if data['Score'] < 5:
numScoresLessThan5 += 1
# Divide this by the total number of games to find the probablility of selecting one
print("The probability of selecting a game with a score less than 5 is: ", numScoresLessThan5/totalNumGames)
熊猫具有出色的内置函数来应对此类问题。 这是使用我从CSV导入的一些测试数据来解决b部分的建议。 我使用的test.csv仅具有这些字段,但是在您更改列名并导入新文件的情况下仍然有效
样本CSV结构 :
# Import relevant packages/modules
import numpy as np
import pandas as pd
# Import a dummy csv data file
data = pd.read_csv("./test.csv")
# Visualize the file before the process
print(data)
# Extract the column you're interesting in counting
initial_column = data['Name']
# Create object for receiving the output of the value_counts function
count_object = pd.value_counts(initial_column)
# Create an empty list for receiving the sorted values
sorted_grouped_column = []
# You determine the number of items. In your exercise is 20.
number_of_items = 3
counter = 0
for i in count_object.keys():
if counter == number_of_items:
break
else:
sorted_grouped_column.append(i)
counter = counter + 1
print(sorted_grouped_column)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.