简体   繁体   中英

Split String in Text File to Multiple Rows in Python

I have a string within a text file that reads as one row, but I need to split the string into multiple rows based on a separator. If possible, I would like to separate the elements in the string based on the period (.) separating the different line elements listed here:

"Line 1: Element '{URL1}Decimal': 'x' is not a valid value of the atomic type 'xs:decimal'.Line 2: Element '{URL2}pos': 'y' is not a valid value of the atomic type 'xs:double'.Line 3: Element '{URL3}pos': 'yz' is not a valid value of the list type '{list1}doubleList'"

Here is my current script that is able to read the.txt file and convert it to a csv, but does not separate each entry into it's own row.

import glob
import csv
import os

path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"

with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
       stripped = (line.strip() for line in infile)
       lines = (line.split(",") for line in stripped if line)
       writer = csv.writer(outfile)
       writer.writerows(lines)

If possible, I would like to be able to just write to a.txt with multiple rows but a.csv would also work - Any help is most appreciated!

One way to make it work:

import glob
import csv
import os

path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"

with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
       stripped = (line.strip() for line in infile)
       lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)
       writer = csv.writer(outfile)
       writer.writerows(lines)

Explanation below:

The output is one line because code in the last line reads a 2d array and there is only one instance in that 2d array which is the entire paragraph. To visualise it, "lines" is stored as [[s1,s2,s3]] where writer.writerows() takes rows input as [[s1],[s2],[s3]]

There can be two improvements.

(1) Take period '.' as seperator. line.split(".")

(2) Iterate over the split list in the list comprehension. lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)

str.split() splits a string by separator and store instances in a list. In your case, it tried to store the list in a list comprehension which made it a 2d array. It saves your paragraph into [[s1,s2,s3]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM