简体   繁体   中英

Ignore new line in a column when reading a csv

I am having a problem processing a csv that I'm loading into an sql database.

The csv has several free text fields and there are some new line characters in the data. This is causing a line to be broken into two rows.

What I would like to do it to setup the code to basically replace new line character with a space when the line has less split character than expected. Because I know how many columns to expect. I really have no idea how I would do this. My current code is below.

batch = list()
with open(file, "r", errors='ignore') as f:
    for l in f.readlines()[1:]:
        # append the processed row to the batch list
        # processed row meaning we strip down the fields to remove redundant data
        # and add Nones if the length of the row is not up to the FIELDS_COUNT
        list_pre = l.split("#|#")
        batch.append([i.strip() for i in list_pre])

So the input looks like this:

col1#|#col2#|#col3#|#col4#|#col5


col1#|#col2#|#co


l3#|#col4#|#col5

col1#|#col2#|#col3#|#col4#|#col5

col1#|#col2#|#col3#|#col4#|#col5

expected output:

['col1','col2','col3','col4','col5']

['col1','col2','col3','col4','col5']

['col1','col2','col3','col4','col5']

['col1','col2','col3','col4','col5']

you can try this:

for row in cursor2.fetchall():
    temp_list = []
    for item in row:
        if isinstance(item, str):
            item = item.strip()
        temp_list.append(item)
    row = tuple(temp_list)
    writer.writerow(row)

Okay so this is how I addressed it, basically I said if the fields are less than the total columns append the next row. To be honest this doesn't work if there are multiple new lines in one row, so it may not work at some future point, but it works for now

file = 'fakecsv.txt'

batch = list()
list_pre0 = []
print(len(list_pre0))
with open(file, "r", errors='ignore') as f:
    for l in f.readlines()[1:]:

        list_pre = l.split("#|#")
        if len(list_pre) < 4:
            print(len(list_pre))
            if len(list_pre0) == 0:
                list_pre0 = list_pre
            else:
                replace_value = list_pre0[-1].replace('\n','') + list_pre[0]
                print('replace value equals: ' + replace_value)
                del list_pre0[-1]
                del list_pre[0]   
                list_pre0.append(replace_value)
                combined =  list_pre0 + list_pre
                batch.append([i.strip() for i in combined])
                list_pre0 = []
            continue
        print(len(list_pre))
        batch.append([i.strip() for i in list_pre])
   print(batch)   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM