Data Cleaning with Pandas in Python

Question

I am trying to clean a csv file for data analysis. How do I convert TRUE FALSE into 1 and 0?

When I search Google, they suggested df.somecolumn=df.somecolumn.astype(int) . However this csv file has 100 columns and not every column is true false(some are categorical, some are numerical). How do I do a sweeping code that allows us to convert any column with TRUE FALSE to 1 and 0 without typing 50 lines of df.somecolumn=df.somecolumn.astype(int)

Answer 1

you can use:

df.select_dtypes(include='bool')=df.select_dtypes(include='bool').astype(int)

Answer 2

A slightly different approach. First, dtypes of a dataframe can be returned using df.dtypes , which gives a pandas series that looks like this,

a     int64
b      bool
c    object
dtype: object

Second, we could replace bool with int type using replace ,

df.dtypes.replace('bool', 'int8') , this gives

a     int64
b     int8
c    object
dtype: object

Finally, pandas seires is essentially a dictionary which can be passed to pd.DataFrame.astype .

We could write it as a oneliner,

df.astype(df.dtypes.replace('bool', 'int8'))

Answer 3

I would do it like this:

df.somecolumn = df.somecolumn.apply(lambda x: 1 if x=="TRUE" else 0)

If you want to iterate through all your columns and check wether they have TRUE/FALSE values, you can do this:

for c in df:
    if 'TRUE' in df[c] or 'FALSE' in df[c]:
        df[c] = df[c].apply(lambda x: 1 if x=='TRUE' else 0)

Note that this approach is case-sensitive and won't work well if in the column the TRUE/FALSE values are mixed with others.

Data Cleaning with Pandas in Python

Question

3 answers

solution1
4 2019-10-16 17:02:38

solution2
0 2019-10-16 19:30:01

solution3
0 2019-10-16 19:59:34

Data Cleaning with Pandas in Python

Question

3 answers

solution1 4 2019-10-16 17:02:38

solution2 0 2019-10-16 19:30:01

solution3 0 2019-10-16 19:59:34

solution1
4 2019-10-16 17:02:38

solution2
0 2019-10-16 19:30:01

solution3
0 2019-10-16 19:59:34