简体   繁体   中英

Replace all the values in a certain column with certain values using csv reader Python

This is the question continous from my previous question. Thank to many people, I could modify my code as below.

import csv
with open("SURFACE2", "rb") as infile, open("output.txt", "wb") as outfile:
    reader = csv.reader(infile, delimiter=" ")
    writer = csv.writer(outfile, delimiter=" ")
    for row in reader:
        row[18] = "999"                  

        writer.writerow(row)

I just change delimiter from "\\t" to " ". Whiel with previous delimiter, the code only worked upto row[0], with " " the code can work until row[18].

15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000   

From the data line above, row[18] is just in the middle between 15.20000 and 120.60000.

I am not sure what happens in between these two values. Maybe delimiter changes? However visually I can't notice any difference. Is there any way which I can know the delimiter changed and if so, do you have any idea to handle multiple delimiter for one code?

Any idea or help would be really appreciated.

Thank you, Isaac


The results from repr(next(infile)):

'            15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'
'            55.10000            -3.60000 03154      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                 16.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-888888.00000      0     16.00000      0    281.20001      0    279.89999      0      0.00000      0      0.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'

As you can see actually four first lines should be one line. For some reason, full line seems divided into 4 parts. Do you have any idea? Thank you, Isaac

NB The file format is discussed on page 19 of this document . This more-or-less agrees with the sample data.

EDIT

OK, after considering the various comments, additional answers, and reading the original question it would seem that the file in question is not a CSV file. It is weather observation data formatted as "little_r" which uses fixed width fields padded with spaces. There is not much info available so I'm guessing, but each group of 4 lines seem to comprise a single observation. From your previous question it seems that you want to update the 3rd column in the first line? The other 3 lines would be skipped. Then update the 3rd column in the first line of the next set of 4 lines, etc., etc.

An example from the OP:

15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
      1      0      0

The first 2 columns of the first line are (I'm guessing) the latitude and longitude for the observations. I have no idea what the 3rd column 98327 is, but this is the column that the OP wants to update (based on previous question).

It's not a CSV file, so don't process it as one. Instead, because there are fixed width fields, we know the offset and width of the field that needs to be updated. Based on the sample data the 3rd column occupies characters 41-46. So, to update the data and write to a new file:

offset_col_3 = 41
length_col_3 = 5

with open('SURFACE2') as infile, open('output.txt', 'w') as outfile:
    for line_no, line in enumerate(infile):
        if line_no % 4 == 0:    # every 4th line starting with the first
            line = '{}{:>5}{}'.format(line[:offset_col_3], 999, line[offset_col_3+length_col_3:])
        outfile.write(line)

Original answer

Try reading line 20 (row[19]) (assuming no header line in the CSV file, otherwise line 21) from the file and inspecting it in Python:

with open("SURFACE2") as infile:
    for i in range(20):
        print repr(next(infile))

The last line displayed will be row 18. If, for example, tabs are delimiters then you might see \\t in between the columns of data. Compare the previous line to the last line to see if there is a difference in the delimiter used.

If you find that your CSV file is mixing delimiters, then you might have to split the fields manually.

The csv module is not the right tool to use when you have fixed-width fields in your file. What you need to do is explicitly use the field lengths to split up the lines. For example:

# This would be your whole file
data = "\n".join([
    "abc  def gh i",
    "jk   lm  n  o",
    "p    q   r  s",
])
field_widths = [5, 4, 3, 1]

def fields(line, field_widths):
    pos = 0
    for length in field_widths:
        yield line[pos:pos + length].strip()
        pos += length

for line in data.split("\n"):
    print(list(fields(line, field_widths)))

will give you:

['abc', 'def', 'gh', 'i']
['jk', 'lm', 'n', 'o']
['p', 'q', 'r', 's']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM