简体   繁体   中英

How can I specify a different decimal format on each column when using Pandas DataFrame to CSV?

I am parsing specific columns from a text file with data that looks like this:

  n Elapsed time  TimeUTC HeightMSL GpsHeightMSL     P   Temp RH   Dewp   Dir Speed Ecomp Ncomp       Lat        Lon
                s hh:mm:ss         m            m   hPa     ∞C  %     ∞C     ∞   m/s   m/s   m/s         ∞          ∞
   1            0 23:15:43       198          198 978.5  33.70 47  20.87 168.0   7.7  -1.6   7.6 32.835222 -97.297940
   2            1 23:15:44       202          201 978.1  33.03 48  20.62 162.8   7.3  -2.2   7.0 32.835428 -97.298000
   3            2 23:15:45       206          206 977.6  32.89 48  20.58 160.8   7.5  -2.4   7.0 32.835560 -97.298077
   4            3 23:15:46       211          211 977.1  32.81 49  20.58 160.3   7.8  -2.6   7.4 32.835660 -97.298160
   5            4 23:15:47       217          217 976.5  32.74 49  20.51 160.5   8.3  -2.7   7.8 32.835751 -97.298242
   6            5 23:15:48       223          223 975.8  32.66 48  20.43 160.9   8.7  -2.8   8.2 32.835850 -97.298317

I perform one calculation on the first m/s column (converting m/s to kt) and write all data where hpa > 99.9 to an output file. That output looks like this:

978.5,198,33.7,20.87,168.0,14.967568
978.1,201,33.03,20.62,162.8,14.190032
977.6,206,32.89,20.58,160.8,14.5788
977.1,211,32.81,20.58,160.3,15.161952
976.5,217,32.74,20.51,160.5,16.133872
975.8,223,32.66,20.43,160.9,16.911407999999998

The code executes fine and the output file works for what I'm using it for, but is there a way to format the column output to a specific decimal place? As you can see in my code, I've tried df.round but it doesn't impact the output. I've also looked at float_format parameter, but that seems like it would apply the format to all columns. My intended output should look like this:

978.5, 198, 33.7, 20.9, 168, 15
978.1, 201, 33.0, 20.6, 163, 14
977.6, 206, 32.9, 20.6, 161, 15
977.1, 211, 32.8, 20.6, 160, 15
976.5, 217, 32.7, 20.5, 161, 16
975.8, 223, 32.7, 20.4, 161, 17

My code is below:

import pandas as pd

headers = ['n', 's', 'time', 'm1', 'm2', 'hpa', 't', 'rh', 'td', 'dir', 'spd', 'u', 'v', 'lat', 'lon']
df = pd.read_csv ('edt_20220520_2315.txt', encoding_errors = 'ignore', skiprows = 2, sep = '\s+', names = headers)

df['spdkt'] = df['spd'] * 1.94384

df['hpa'].round(decimals = 1)
df['spdkt'].round(decimals = 0)
df['t'].round(decimals = 1)
df['td'].round(decimals = 1)
df['dir'].round(decimals = 0)

extract = ['hpa', 'm2', 't', 'td', 'dir', 'spdkt']

with open('test_output.txt' , 'w') as fh:
    df_to_write = df[df['hpa'] > 99.9]
    df_to_write.to_csv(fh, header = None, index = None, columns = extract, sep = ',')

You can pass dictionary and then if round by 0 casting columns to integers:

d = {'hpa':1, 'spdkt':0, 't':1, 'td':1, 'dir':0}
df = df.round(d).astype({k:'int' for k, v in d.items() if v == 0})

print (df)
   n  s      time   m1   m2    hpa     t  rh    td  dir  spd    u    v  \
0  1  0  23:15:43  198  198  978.5  33.7  47  20.9  168  7.7 -1.6  7.6   
1  2  1  23:15:44  202  201  978.1  33.0  48  20.6  163  7.3 -2.2  7.0   
2  3  2  23:15:45  206  206  977.6  32.9  48  20.6  161  7.5 -2.4  7.0   
3  4  3  23:15:46  211  211  977.1  32.8  49  20.6  160  7.8 -2.6  7.4   
4  5  4  23:15:47  217  217  976.5  32.7  49  20.5  160  8.3 -2.7  7.8   
5  6  5  23:15:48  223  223  975.8  32.7  48  20.4  161  8.7 -2.8  8.2   

         lat        lon  spdkt  
0  32.835222 -97.297940     15  
1  32.835428 -97.298000     14  
2  32.835560 -97.298077     15  
3  32.835660 -97.298160     15  
4  32.835751 -97.298242     16  
5  32.835850 -97.298317     17  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM