I have a delimited file that's causing me a bit of grief. It's pipe delimited, 6 fields. but field 4 can be split over several lines or contain nothing. I need a way to remove the newline fields from field 4.
Here's what I've got
import csv
#header is constant
#filedone|fieldtwo|three|four|five|six
content = """"asfdd|b|c|defg
ijklmnopque2
|record|sadfe
1324|b|c|defg
ijklmnopqu
dafdsasfde2asdf
dsfdsf
dsfadfadse2fdsase2
asdfasdfasfe2
|record|afasde
3243243|b|c|defg
ijklmnopque2
|record|adf
startrecord4|b|c||record|adf
"""
def extract():
x = []
y = []
x = content.split('|')
for item in x:
if (len(item) > 4):
y.append(item.replace('\n', '').replace('\r', ' '))
else:
y.append(item)
print(y)
if __name__ == '__main__':
extract()
This will run and the problem is just output it all in one row. I do still need it to output indivicual records (4 in this case) without the newlines, but I'm not sure how. Can I read the whole file with pandas.read_csv? Is there a better solution?
The header is constant across all records.
Would it be a solution for you to simply replace all double newlines by a placeholder to then explicitely remove the single newlines after which you can restore single newlines at the placeholder positions again?
You can try
sth_unique = '#%@#'
c = content.replace('\n\n', sth_unique).replace('\n', '').replace(sth_unique, '\n')
print(c)
#"asfdd|b|c|defgijklmnopque2|record|sadfe
#1324|b|c|defgijklmnopqudafdsasfde2asdfdsfdsfdsfadfadse2fdsase2asdfasdfasfe2|record|afasde
#3243243|b|c|defgijklmnopque2|record|adf
#startrecord4|b|c||record|adf
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.