简体   繁体   中英

pandas dataframe column contains string and int

My data-frame age column looks like this

20 or younger =14

61 or older =45

56-60 = 34

31-35 =30

56 or older =31

21-25 =23

26 30 =56

31 35 =44

36 40 =32

21 25 =26

26-30 =14

46 50 =14

36-40 =15

46-50 =33

41 45 =24

41-45 =29

51-55 =35

so i wrote this function to categorize it better but i got this typeerror message that says '<' not supported between instance of str and int

def age_buckets(x):

if x < 30: 
    return '18-29' 
elif x < 40: 
    return '30-39' 
elif x < 50: 
    return '40-49' 
elif x < 60: 
    return '50-59' 
elif x < 70: 
    return '60-69' 
elif x >=70: 
    return '70+' 
else: return 'other'

Here is a link to what i am doing https://deepnote.com/workspace/eddie-abfa350f-f15e-43fe-8960-fab53a2def2e/project/Welcome-e6ac66b9-19f2-4973-bbc2-7adfda9366f3/%2FReasons%20for%20resignation%20analysis.ipynb

You can't compare a string of characters with the < check. It doesn't associate that string with a number. That error says that the incoming x value is a string . Therefore, in order to do this, x must be a number. If it is in-fact an int , you can cast it with the int() function. Such as int(x) < 30 ...

What would be better is that you pass age_buckets an int rather than a string . So when you call it just do age_buckets(int(x)) rather than just age_buckets(x)

Please see : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

So when you do combined['age'] = combined['age'].apply(age_buckets(int(x))) you actually need to do combined['age'] = combined['age'].apply(age_buckets,1))

See if :

def age_buckets(y):
     x = int(y)
     if x < 30:
        ...

works

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM