I have an old windows 95 program that exports data without account numbers, seasonal accounts, and if accounts contains a sub account.
I am, however, able to print customer information and notes that has the above information to a pdf file and copy that text to notepad; which I would like to extract the data.
The order the data: 1) page headers (I do not need this data.)
Company Name
Customer Information and Notes
Computed Monday, August 10 2015 Page 1
2) standard titles and 3) the data after titles:
Ser Name: Block, Sunny Route: 1
Address: 3354 ASPEN RD. Frequency: Monthly
Address: ST PETE, GA 33333 Week/Day: First Monday
City State Zip: data Sched Time (HH:MM): 10:00A
Ser Phone: 555-1212 Service: BASIC SERVICE
Bill to: BLOCK,SUNNY Rate ($): 24.00
Company Name
Customer Information and Notes
Computed Monday, August 10 2015 Page 2
Address: 1123 Sligh Terms: CASH
Address: Apt B
notes: Sunny has a mean dog
Do not enter unless dog is put up
Then it loops to next customers data and so on.
The main titles never change, such as, ser name, route, address, notes, phone. There is a set number of titles in order; however, the title notes: can take 1 -16 lines; and the header is random throughout the data. and although the titles are in order, address is titled 4 times for both service- line 1 and line 2 and billing addresses- line 1 and line 2.
I would like to set variables to these titles and only take what's after them; the extraction part through PHP. Is there anyway to do this?
I don't think it's possible for a perfect solution, but FWIW, maybe this is good enough for you.
Without a known / reliable delimiter between clients, I can't think of any good way you can get the notes without having the header stuff for the next company included, unless you can do something involving a big lookup table of all client names.
I do have (an ugly) regex that may reliably help as far as the other stuff though:
$content='[the contents of your file]';
preg_match_all('~(Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes):\s*((?:(?!Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes).)+)~is',$content,$matches);
So this basically looks for the "header" and puts into first captured group, and then matches up to the next "header" and puts that into 2nd captured group.
Perhaps this is good enough for you, but TBH I can't think of anything better you can do, unless you can improve your extraction to a better format.
So your example data would output:
Array
(
[0] => Array
(
[0] => Ser Name: Block, Sunny
[1] => Route: 1
[2] => Address: 3354 ASPEN RD.
[3] => Frequency: Monthly
[4] => Address: ST PETE, GA 33333
[5] => Week/Day: First Monday
[6] => City State Zip: data
[7] => Sched Time (HH:MM): 10:00A
[8] => Ser Phone: 555-1212
[9] => Service: BASIC SERVICE
[10] => Bill to: BLOCK,SUNNY
[11] => Rate ($): 24.00
Company Name
Customer Information and Notes
Computed Monday, August 10 2015 Page 2
[12] => Address: 1123 Sligh
[13] => Terms: CASH
[14] => Address: Apt B
[15] => notes: Sunny has a mean dog
)
[1] => Array
(
[0] => Ser Name
[1] => Route
[2] => Address
[3] => Frequency
[4] => Address
[5] => Week/Day
[6] => City State Zip
[7] => Sched Time (HH:MM)
[8] => Ser Phone
[9] => Service
[10] => Bill to
[11] => Rate ($)
[12] => Address
[13] => Terms
[14] => Address
[15] => notes
)
[2] => Array
(
[0] => Block, Sunny
[1] => 1
[2] => 3354 ASPEN RD.
[3] => Monthly
[4] => ST PETE, GA 33333
[5] => First Monday
[6] => data
[7] => 10:00A
[8] => 555-1212
[9] => BASIC SERVICE
[10] => BLOCK,SUNNY
[11] => 24.00
Company Name
Customer Information and Notes
Computed Monday, August 10 2015 Page 2
[12] => 1123 Sligh
[13] => CASH
[14] => Apt B
[15] => Sunny has a mean dog
)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.