简体   繁体   中英

PHP extracting data from text

I have an old windows 95 program that exports data without account numbers, seasonal accounts, and if accounts contains a sub account.

I am, however, able to print customer information and notes that has the above information to a pdf file and copy that text to notepad; which I would like to extract the data.

The order the data: 1) page headers (I do not need this data.)

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 1

2) standard titles and 3) the data after titles:

Ser Name: Block, Sunny Route: 1

Address: 3354 ASPEN RD. Frequency: Monthly

Address: ST PETE, GA 33333 Week/Day: First Monday

City State Zip: data Sched Time (HH:MM): 10:00A

Ser Phone: 555-1212 Service: BASIC SERVICE

Bill to: BLOCK,SUNNY Rate ($): 24.00

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 2

Address: 1123 Sligh Terms: CASH

Address: Apt B

notes: Sunny has a mean dog

Do not enter unless dog is put up

Then it loops to next customers data and so on.

The main titles never change, such as, ser name, route, address, notes, phone. There is a set number of titles in order; however, the title notes: can take 1 -16 lines; and the header is random throughout the data. and although the titles are in order, address is titled 4 times for both service- line 1 and line 2 and billing addresses- line 1 and line 2.

I would like to set variables to these titles and only take what's after them; the extraction part through PHP. Is there anyway to do this?

I don't think it's possible for a perfect solution, but FWIW, maybe this is good enough for you.

Without a known / reliable delimiter between clients, I can't think of any good way you can get the notes without having the header stuff for the next company included, unless you can do something involving a big lookup table of all client names.

I do have (an ugly) regex that may reliably help as far as the other stuff though:

$content='[the contents of your file]';
preg_match_all('~(Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes):\s*((?:(?!Ser Name|Route|Address|Frequency|Week/Day|City State Zip|Sched Time \(HH:MM\)|Ser Phone|Service|Bill to|Rate \(\$\)|Terms|notes).)+)~is',$content,$matches);

So this basically looks for the "header" and puts into first captured group, and then matches up to the next "header" and puts that into 2nd captured group.

Perhaps this is good enough for you, but TBH I can't think of anything better you can do, unless you can improve your extraction to a better format.

So your example data would output:

Array
(
    [0] => Array
        (
            [0] => Ser Name: Block, Sunny 
            [1] => Route: 1


            [2] => Address: 3354 ASPEN RD. 
            [3] => Frequency: Monthly


            [4] => Address: ST PETE, GA 33333 
            [5] => Week/Day: First Monday


            [6] => City State Zip: data 
            [7] => Sched Time (HH:MM): 10:00A


            [8] => Ser Phone: 555-1212 
            [9] => Service: BASIC SERVICE


            [10] => Bill to: BLOCK,SUNNY 
            [11] => Rate ($): 24.00

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 2


            [12] => Address: 1123 Sligh 
            [13] => Terms: CASH


            [14] => Address: Apt B


            [15] => notes: Sunny has a mean dog
        )

    [1] => Array
        (
            [0] => Ser Name
            [1] => Route
            [2] => Address
            [3] => Frequency
            [4] => Address
            [5] => Week/Day
            [6] => City State Zip
            [7] => Sched Time (HH:MM)
            [8] => Ser Phone
            [9] => Service
            [10] => Bill to
            [11] => Rate ($)
            [12] => Address
            [13] => Terms
            [14] => Address
            [15] => notes
        )

    [2] => Array
        (
            [0] => Block, Sunny 
            [1] => 1


            [2] => 3354 ASPEN RD. 
            [3] => Monthly


            [4] => ST PETE, GA 33333 
            [5] => First Monday


            [6] => data 
            [7] => 10:00A


            [8] => 555-1212 
            [9] => BASIC SERVICE


            [10] => BLOCK,SUNNY 
            [11] => 24.00

Company Name

Customer Information and Notes

Computed Monday, August 10 2015 Page 2


            [12] => 1123 Sligh 
            [13] => CASH


            [14] => Apt B


            [15] => Sunny has a mean dog
        )

)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM