I have limited Python knowledge, so I'm having a lot of trouble fixing this.
After extracting text from a pdf file and doing a small cleanup, I got the following result:
"BARRINE QLD 4872ARCHDALE VIC 3475ARCHDALE JUNCTION VIC 3475ARCHER NT 0830ARCHER RIVER QLD 4892"
( This is a small sample from a much larger result! )
Is there a way to add a break line after the numbers? So, instead of the string above, I'd have something similar to this:
'BARRINE QLD 4872',
'ARCHDALE VIC 3475'
I tried reading different articles about this, but perhaps due to my lack of knowledge I simply can't figure it out!
This is not the most elegant solution, but something like this might work:
string = "BARRINE QLD 4872ARCHDALE VIC 3475ARCHDALE JUNCTION VIC 3475ARCHER NT 0830ARCHER RIVER QLD 4892"
def split_at_numbers(string):
char_at = 0
temp_str = ""
out = []
while char_at < len(string):
temp_str += string[char_at]
try:
if string[char_at].isnumeric() and not string[char_at + 1].isnumeric():
out.append(temp_str)
temp_str = ""
except IndexError:
out.append(temp_str)
char_at += 1
return out
print(split_at_numbers(string))
# output: ['BARRINE QLD 4872', 'ARCHDALE VIC 3475', 'ARCHDALE JUNCTION VIC 3475', 'ARCHER NT 0830', 'ARCHER RIVER QLD 4892']
The loop above iterates over each character, and checks if the character is one) a number and two) not followed by a number. If those two conditions are true, then we break off that section and go into the next section of that string. We store each of those sections into a list which we return at the end.
From there, the data should be easy to work with.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.