This is what it was asked for me to do:
Remove the dollar sign and comma from the columns. If necessary, convert these two columns to the appropriate data type.
As my dataset does not contain values with $ sign, I am removing the '." in the numbers of review for "," for the sake of the exercise
def remove_commas(value):
if pd.isna(value):
return np.NaN
else:
return float(value.replace (".", ","))
df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"
Error Message number 1:
File "/var/folders/vr/bbf8y6555gs306xzf_x7zxf80000gn/T/ipykernel_22769/1957524384.py", line 1
df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"
^
SyntaxError: EOL while scanning string literal
Error Message number 2:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'reviews per month'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/vr/bbf8y6555gs306xzf_x7zxf80000gn/T/ipykernel_22769/969712826.py in <module>
----> 1 df["reviews per month"]
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
-> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, _check_indexing_error will raise
KeyError: 'reviews per month'
Question: what is the issue? Could be related to the datatype?
For this header is displaying
reviews_per_month float64
def remove_commas(value):
if pd.isna(value):
return np.NaN
else:
return float(value.replace (".", ","))
df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"
I was expecting to get this change in this header of the dataset:
from "reviews_per_month: 0.20" to change to "reviews_per_month: 0,20"
There is no example dataframe provided, so i have created one for the purpose of the question.
Points to note:
df.apply()
was incorrect. float()
on values with a comma (which are strings) would fail. Side comment : it is not clear why you replace .
with ,
as this would change the type from number to string which appears to be suboptimal.
So i made those changes.
This works:
import numpy as np
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['5.5', '6.1', '7.14', '8.2']})
# Define a function to be applied to each row of the dataframe
def add_columns(row):
return row['A'] + row['B']
def remove_commas(value:str):
if pd.isna(value):
return np.NaN
else:
return value.replace(".", ",")
# Apply the function to the dataframe using the apply() method
df['C'] = df['B'].apply(remove_commas)
# Print the resulting dataframe
print(df)
the return is this:
A B C
0 1 5.5 5,5
1 2 6.1 6,1
2 3 7.14 7,14
3 4 8.2 8,2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.