How do I create a function that can: replace (0.0) to NaN, remove underscores, convert clean strings into a float datatype or otherwise return the converted data?
So far I have tried the following:
def score_cleaner(underscored):
if underscored == '_000':
return np.NaN
long_data['Numeric Score']= long_data['Score'].apply(lambda x:(float(x.replace('_',''))))
long_data ['Numeric Score']= long_data ['Score'].apply(score_cleaner)
However this has resulted in either an output of endless "NaNs", or all the numerical values rather than a combination of the two where 0.0's are converted to NaNs and the rest of the data is left alone:
PID_Sex PID_Age ManipulationScoreFace IDCondition Numeric Score
103 Female 18 Symmetry _005 101 Manipulated NaN
106 Female 19 Symmetry _000 101 Manipulated NaN
106 Male 22 Symmetry _000 101 Manipulated NaN
109 Male 20 Symmetry _000 101 Manipulated NaN
112 Female 18 Symmetry _000 101 Manipulated NaN
115 Female 18 Symmetry _000 101 Manipulated NaN
118 Female 19 Symmetry _003 101 Manipulated NaN
121 Female 18 Symmetry _000 101 Manipulated NaN
124 Female 19 Symmetry _004 101 Manipulated NaN
127 Female 19 Symmetry _005 101 Manipulated NaN
PID_Sex PID_Age ManipulationScoreFace IDConditionNumericScore
103 Female 18 Symmetry _005 101 Manipulated 5.0
106 Female 19 Symmetry _000 101 Manipulated 0.0
106 Male 22 Symmetry _000 101 Manipulated 0.0
109 Male 20 Symmetry _000 101 Manipulated 0.0
112 Female 18 Symmetry _000 101 Manipulated 0.0
115 Female 18 Symmetry _000 101 Manipulated 0.0
118 Female 19 Symmetry _003 101 Manipulated 3.0
121 Female 18 Symmetry _000 101 Manipulated 0.0
124 Female 19 Symmetry _004 101 Manipulated 4.0
127 Female 19 Symmetry _005 101 Manipulated 5.0
I don't exactly get what you want, so here are the two most likely options:
Convert the column to float
type with '_000'
being converted to np.nan
and the rest to numeric values:
long_data['Numeric Score'] = long_data['Score'].str.replace('_', '').astype(float).replace(0., np.nan)
or as a function definition:
def score_cleaner(underscore):
return underscore.str.replace(
'_', '').astype(float).replace(0., np.nan)
long_data['Numeric Score'] = score_cleaner(long_data['Score'])
Convert the column to object
type with '_000'
being converted to the string 'NaN'
and leave the rest as it is:
long_data['Numeric Score'] = long_data['Score'].str.replace('_000', 'NaN')
and again defined as a function:
def score_cleaner(underscore):
return underscore.str.replace('_000', 'NaN')
long_data['Numeric Score'] = score_cleaner(long_data['Score'])
You can use this:
df['Numeric Score'] = df['Score'].apply(lambda x:(float(x.replace('_',''))))
df['Numeric Score'][df['Score'] == '_000'] = np.NaN
To create a function you could you this:
def score_cleaner(underscored):
if underscored == '_000':
return np.NaN
else:
return float(underscored.replace('_',''))
long_data ['Numeric Score']= long_data['Score'].map(score_cleaner)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.