I am having a problem processing a csv that I'm loading into an sql database.
The csv has several free text fields and there are some new line characters in the data. This is causing a line to be broken into two rows.
What I would like to do it to setup the code to basically replace new line character with a space when the line has less split character than expected. Because I know how many columns to expect. I really have no idea how I would do this. My current code is below.
batch = list()
with open(file, "r", errors='ignore') as f:
for l in f.readlines()[1:]:
# append the processed row to the batch list
# processed row meaning we strip down the fields to remove redundant data
# and add Nones if the length of the row is not up to the FIELDS_COUNT
list_pre = l.split("#|#")
batch.append([i.strip() for i in list_pre])
So the input looks like this:
col1#|#col2#|#col3#|#col4#|#col5
col1#|#col2#|#co
l3#|#col4#|#col5
col1#|#col2#|#col3#|#col4#|#col5
col1#|#col2#|#col3#|#col4#|#col5
expected output:
['col1','col2','col3','col4','col5']
['col1','col2','col3','col4','col5']
['col1','col2','col3','col4','col5']
['col1','col2','col3','col4','col5']
you can try this:
for row in cursor2.fetchall():
temp_list = []
for item in row:
if isinstance(item, str):
item = item.strip()
temp_list.append(item)
row = tuple(temp_list)
writer.writerow(row)
Okay so this is how I addressed it, basically I said if the fields are less than the total columns append the next row. To be honest this doesn't work if there are multiple new lines in one row, so it may not work at some future point, but it works for now
file = 'fakecsv.txt'
batch = list()
list_pre0 = []
print(len(list_pre0))
with open(file, "r", errors='ignore') as f:
for l in f.readlines()[1:]:
list_pre = l.split("#|#")
if len(list_pre) < 4:
print(len(list_pre))
if len(list_pre0) == 0:
list_pre0 = list_pre
else:
replace_value = list_pre0[-1].replace('\n','') + list_pre[0]
print('replace value equals: ' + replace_value)
del list_pre0[-1]
del list_pre[0]
list_pre0.append(replace_value)
combined = list_pre0 + list_pre
batch.append([i.strip() for i in combined])
list_pre0 = []
continue
print(len(list_pre))
batch.append([i.strip() for i in list_pre])
print(batch)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.