Add a new column to pandas dataframe with coverted values from another column?

Question

I have a pandas dataframe from a csv file and i want to add 3 columns in python 3.8

add a column and convert meters to miles (length is meters, new column will be length_miles).
add column to convert meters to feet (elevation_gain is in meters, new column will be elevation_gain_feet.
add a column that computes a difficulty rating as follows: nps difficulty rating = Elevation Gain(feet) x 2 x distance (in miles). The product's square root is the numerical rating.

This needs to be broken down a little further into a difficulty rating of 1-5. the current difficulty rating in the data set is not informative so i want to use the national park service rating.

if the numerical difficulty rating is:

under 50, then the value is 1 50-100, then difficulty rating is 2 101-150, then difficulty rating is 3 151-200, then difficulty rating is 4 above 200, then difficulty rating is 5

Ideally this would compute and just put the number 1-5 in the column, but having 2 new columns for #3 would be fine as well.

Here are the columns from my dataframe and values from a couple rows. I have not yet thought about making the nps 1-5 ratings in the dataframe, I am not sure if I can, or need to do it outside the dataframe in a function. unfortunately it does not seem to be adding the columns like I want it to, so I think I must be doing something wrong. code I have so far

df = pd.read_csv('data.csv')
df.assign(length_miles = lambda x: x['length'] * 0.00062137, axis = 1)
df.assign(elevation_gain_ft = lambda x: x['elevation_gain'] * 3.28084, axis = 1)
df.assign(num_dif_rating = lambda x: np.sqrt( x['length_miles'] * 2 * x['elevation_gain_ft'], axis = 1))

Answer 1

You need to use the assign method:

df.assign(YourColumn = lambda x: conversion_formula(x['Meters']), axis = 1)

Here's the link to the documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html

Good luck!

Answer 2

I got it to work like this. it cleaned up the data just the way i need.

def data_cleanup():
df = pd.read_csv('AllTrails data.csv')
# convert meters to miles and feet and add columns
df['length_miles']=df['length'].apply(lambda x : x*0.000621371)
df['elevation_gain_feet']=df['elevation_gain'].apply(lambda x : x*3.28084)
def difficulty_rating(x, y):
    res = np.sqrt(x * y * 2)
    if res < 50:
        return 1
    elif res >= 50 and res <= 100:
        return 2
    elif res >= 101 and res <= 150:
        return 3
    elif res >= 151 and res <= 200:
        return 4
    else:
        return 5
df['nps_difficulty_rating'] = df.apply(lambda x: difficulty_rating (x.length_miles, x.elevation_gain_feet), axis=1)

df.to_csv('np trails.csv')

data_cleanup()

Add a new column to pandas dataframe with coverted values from another column?

Question

2 answers

solution1
1 2020-11-27 22:39:32

solution2
0 ACCPTED 2020-11-28 18:43:00

Add a new column to pandas dataframe with coverted values from another column?

Question

2 answers

solution1 1 2020-11-27 22:39:32

solution2 0 ACCPTED 2020-11-28 18:43:00

solution1
1 2020-11-27 22:39:32

solution2
0 ACCPTED 2020-11-28 18:43:00