SyntaxError: EOL while scanning string literal

Question

This is what it was asked for me to do:

Remove the dollar sign and comma from the columns. If necessary, convert these two columns to the appropriate data type.

As my dataset does not contain values with $ sign, I am removing the '." in the numbers of review for "," for the sake of the exercise

def remove_commas(value):
    if pd.isna(value):
        return np.NaN
    else:
        return float(value.replace (".", ","))

df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"

Error Message number 1:

File "/var/folders/vr/bbf8y6555gs306xzf_x7zxf80000gn/T/ipykernel_22769/1957524384.py", line 1
df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"
^
SyntaxError: EOL while scanning string literal

Error Message number 2:

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3628             try:
-> 3629                 return self._engine.get_loc(casted_key)
3630             except KeyError as err:

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'reviews per month'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/var/folders/vr/bbf8y6555gs306xzf_x7zxf80000gn/T/ipykernel_22769/969712826.py in <module>
----> 1 df["reviews per month"]

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3503             if self.columns.nlevels > 1:
3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
3506             if is_integer(indexer):
3507                 indexer = [indexer]

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3629                 return self._engine.get_loc(casted_key)
3630             except KeyError as err:
-> 3631                 raise KeyError(key) from err
3632             except TypeError:
3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'reviews per month'

Question: what is the issue? Could be related to the datatype?

For this header is displaying

reviews_per_month                               float64

def remove_commas(value):
    if pd.isna(value):
       return np.NaN
    else:
        return float(value.replace (".", ","))

df["reviews per month"]=df["reviews_per_month"].apply(lambda x: remove_commas(x))"

I was expecting to get this change in this header of the dataset:

from "reviews_per_month: 0.20" to change to "reviews_per_month: 0,20"

Answer 1

There is no example dataframe provided, so i have created one for the purpose of the question.

Points to note:

the implementation of df.apply() was incorrect.
doing float() on values with a comma (which are strings) would fail.

Side comment : it is not clear why you replace . with , as this would change the type from number to string which appears to be suboptimal.

So i made those changes.

This works:

import numpy as np
import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['5.5', '6.1', '7.14', '8.2']})

# Define a function to be applied to each row of the dataframe
def add_columns(row):
    return row['A'] + row['B']

def remove_commas(value:str):
    if pd.isna(value):
        return np.NaN
    else:
        return value.replace(".", ",")

# Apply the function to the dataframe using the apply() method
df['C'] = df['B'].apply(remove_commas)

# Print the resulting dataframe
print(df)

the return is this:

   A     B     C
0  1   5.5   5,5
1  2   6.1   6,1
2  3  7.14  7,14
3  4   8.2   8,2

SyntaxError: EOL while scanning string literal

Question

1 answers

solution1
0 2022-12-27 23:54:29

SyntaxError: EOL while scanning string literal

Question

1 answers

solution1 0 2022-12-27 23:54:29

solution1
0 2022-12-27 23:54:29