read pandas colum with number values and missing data as string

Question

I have an Id column in my data frame like this:

a = pandas.DataFrame([12673, 44, 847])

This data has some missing values. If I Keep_default_NA = True, then the missing value is filled by NaN, and the data is read as float, and therefore the values will change to

12673.0 , 44.0, 847.0

which is not desired ( I want to drop NA values and convert to str/obj because the id can be of any length). If I keep_default_NA = False, then other columns (such as booleans) all become object and I have to compare string values to find out true/false values.

Answer 1

If you want NaN values, you have to have floats. https://stackoverflow.com/a/38003951/3841261

Use "keep_default_NA = True", then after dropping the NaNs, convert the column to integers.

Answer 2

Without a better sample of your data I can't be sure but maybe this will help:

First you read your data preserving the dtype, then you basically read it again to get the right id . ~~If your boolean columns also miss values (empty strings) you will need to cast those rows with df.astype("bool") .~~

df1 = pd.read_csv("test.csv", keep_default_na=True).dropna()
df2 = pd.read_csv("test.csv", keep_default_na=False)
df1["id"] = df2.loc[df1.index]["id"]
df = pd.DataFrame(df1.to_dict())

if you don't want to read it in twice, you could read it in with keep_default_na=False then filter out rows with empty strings and cast every column to it's desired dtype or df = pd.DataFrame(df1.to_dict()) .

read pandas colum with number values and missing data as string

Question

2 answers

solution1
0 2018-08-24 15:15:19

solution2
0 2018-08-24 15:17:14

read pandas colum with number values and missing data as string

Question

2 answers

solution1 0 2018-08-24 15:15:19

solution2 0 2018-08-24 15:17:14

solution1
0 2018-08-24 15:15:19

solution2
0 2018-08-24 15:17:14