I'm very new to python so there may be a simple solution here. I'm trying to clean a data set about rent prices/square footage within a panda data frame. My data column for bedrooms includes information about bedrooms AND square feet. Most of the entries are formatted like "/ 1br - 950ft²" but some are "/ 1br" and some are "/950ft²". I'm trying to create a clean column with just bedrooms, but because of formatting I can't just split the string after a certain character.
I've decided I need to create a function to test for if the string contains "br", but I'm getting an error.
Here's my code:
def cleaned_bedrooms(x):
if df[df['bedrooms'].str.contains('br')]:
df['bedrooms'] = df['bedrooms'].str.split('-').str[0]
else:
return None
df['bedrooms'].map(cleaned_bedrooms)
I seem to have set up a boolean function though (I assume triggered by the if statement), because the error I'm getting is "ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()." for the line containing the .map(cleaned_bedrooms)
If this is your dataframe,
bedrooms
0 / 1br - 950ft²
1 / 1br
2 /950ft²
You can use str.extract to extract bedrooms
df['bedrooms'] = df['bedrooms'].str.extract('(\d+?br)', expand = False)
You get
bedrooms
0 1br
1 1br
2 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.