简体   繁体   中英

How to search and identify a float value in a csv file using pandas?

I have a csv file which has string and float values like below:

"c1","c2","c3"
"A","1.3334343434","1"
"B","2","6.434343443434"
"D","3.434344343443","P"
"B","2.446647884844","Z"
"A","2","1.98984934394943"

I need to identify only float in this file and round it off up to 2 decimal places. If i am using this as pandas data frame it gives an error. Not sure how to identify a float value to do the round() operation. Looking for suggestion. Thanks

For your c2 column we can use round since it's already in float type.

And for the c3 column we look for we can match all decimal numbers and slice them off after the 4th character, since they are still strings

# Print initial df
  c1        c2                c3
0  A  1.333434                 1
1  B  2.000000    6.434343443434
2  D  3.434344                 P
3  B  2.446648                 Z
4  A  2.000000  1.98984934394943

df['c2'] = df['c2'].round(2)
df['c3'] = np.where(df['c3'].str.match('\d\.\d+'), df['c3'].str.slice(stop=4), df['c3'])

  c1    c2    c3
0  A  1.33     1
1  B  2.00  6.43
2  D  3.43     P
3  B  2.45     Z
4  A  2.00  1.98

If you want to have column c3 in float type as well, you have to drop the rows with P and Z :

df['c3'] = np.where(df['c3'].str.match('\d\.\d+|\d+'), 
                    df['c3'], 
                    np.NaN).astype(float).round(2)

  c1    c2    c3
0  A  1.33  1.00
1  B  2.00  6.43
2  D  3.43   NaN
3  B  2.45   NaN
4  A  2.00  1.99

Edit after OP comment about all columns:

for col in df.columns:
    df[col] = np.where(df[col].str.match('\d\.\d+|\d+'), 
                       df[col], 
                       np.NaN).astype(float).round(2)

A very simple way to do this is by using a simple custom function, apply and inside that a try and catch.

data=pd.read_csv('newdata.csv')

print(data)

The data is as provided:

  c1        c2                c3
0  A  1.333434                 1
1  B  2.000000    6.434343443434
2  D  3.434344                 P
3  B  2.446648                 Z
4  A  2.000000  1.98984934394943

Now we create a custom function which takes column by column from a dataframe and rounds of any float to 2 decimal places:

def change(m):
    k=[]

    for x in m:
        try:
            k.append(round(float(x),2))

        except:
            k.append(x)
    return k


data.apply(lambda x: change(x))

And the output is:

    c1   c2      c3
0   A   1.33    1
1   B   2.00    6.43
2   D   3.43    P
3   B   2.45    Z
4   A   2.00    1.99

What the function does is, inside the try statement it tries to convert anything into a float, if possible, it rounds it off and sends it back, and if there's an error, it returns back the original value(whether string or anything else).

The solution suggested by the other user is pretty good too. So, go for those if that helps you better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM