简体   繁体   中英

How can I write a regex in PHP to grab titles?

I want to write a regular expression that will grab book/movie titles.

So far, I have this written in PHP:

(?:                                # Start of group:
\b                                # Match start of a word
(?:                               # Start of inner group:
[A-Z]*
[A-Z][a-z]*                      # Either match an uppercase word
|                                 # or
(?:a[nts]|the|by|for|i[nt]|      # one of these "special" words
 o[fnr]|to|up|and|but|nor)
)                                 # End of inner group
\b                                # Match end of word
\s*                              # Match one or more whitespace characters
)+                                 # Match one or more of the above.

My input is as follows:

I watched the movie The Girl With the Dragon Tattoo but it wasn't very good.

This matches on:

I
the
The Girl With the Dragon Tattoo but it

I understand this is a complex issue, and while I would like it to return only:

The Girl With the Dragon Tattoo

I would be okay with:

I
The Girl With the Dragon Tattoo

How could I alter my regex to accomplish this?

As i understand the question, you want to match any user input and find a book title or a movie name.

If you have a really good database with books/movies what you can do is create an algorithm.

For example always make the input to lowercase and check every single title if you have in the database.

If you manage to find a match: you can match couple of words in front of the title and after it. You can save them to a db. After that when you check an input and you don't find the title you can create a preg_match based on those previous inputs and determine the closest you can get to the title.

If you are lucky you save the new title to the database.

I don't think this is going to work slightly close to good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM