简体   繁体   English

正则表达式字符串“ preg_replace”

[英]RegEx string “preg_replace”

I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database. 我需要对CSV文件的约45,000行执行“查找并替换”,然后将其放入数据库中。

I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression... 我认为我应该可以使用PHP和preg_replace做到这一点,但似乎无法弄清楚该表达式...

The lines consist of one field and are all in the following format: 这些行由一个字段组成,并且全部采用以下格式:

"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg" “ ./1/024/9780310320241/SPSTANDARD.9780310320241.jpg”或“ ./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg”

The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters. 第一部分将始终为一个句点,第二部分将始终为一个字母数字字符,第三部分将始终为三个字母数字字符,第四部分应始终在1到13个字母数字字符之间。

I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! 我提出了以下似乎正确的建议,但我将公开宣称对正则表达式一点也不了解,这对我来说有点新! I'm probably making a whole load of silly mistakes here... 我可能在这里犯了很多愚蠢的错误...

$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);

Anyway any and all help appreciated! 无论如何,任何帮助都值得赞赏!

Thanks, Phil 谢谢,菲尔

The only mistake I encouter is the anchor for the string end $ that should be removed. 我遇到的唯一错误是应该删除的字符串end $的锚点。 And your expression is also missing the _ character: 而且您的表达式也缺少_字符:

/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z_]{1,13}\/)/

A more general pattern would be to just exclude the / : 更一般的模式是只排除/

/^(\.\/[^\/]{1}\/[^\/]{3}\/[^\/]{1,13}\/)/

在匹配任何模式之前,您应该使用PHP的内置解析器从csv中提取值。

The $ means the end of the string. $表示字符串的结尾。 So your pattern would match ./1/024/9780310320241/ and ./t/fla/8204909_flat/ if they were alone on their line. 因此,如果它们单独出现,则您的模式将匹配./1/024/9780310320241/./t/fla/8204909_flat/ Remove the $ and it will match the first four parts of your string, replacing them with a space. 删除$ ,它将与字符串的前四个部分匹配,并用空格替换。

$pattern = "/(\.\/[0-9a-z]{1}\/[0-9a-z]{3}\/[0-9a-z\_]+\.(jpg|bmp|jpeg|png))\n/is";

I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. 我刚刚看到,您的示例字符串不以/结尾,所以可能您应该从模式末尾将其删除。 Also underscore is used in the filename and should be in the character class. 在文件名中也应使用下划线,并且应在字符类中。

I'm not sure I understand what you're asking. 我不确定我是否理解您的要求。 Do you mean every line in the file looks like that, and you want to process all of them? 您是说文件中的每一行都是这样,并且您想处理所有这些行吗? If so, this regex would do the trick: 如果是这样,则此正则表达式可以解决问题:

'#^.*/#' 

That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. 这简单地匹配了包括最后一个斜杠在内的所有内容,如果不是每个人都在谈论的冒号“ $”,则这是您的正则表达式将执行的操作。 If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs: 如果您想单独使用其他格式的其他行,则此正则表达式可能会满足您的需求:

'#^\./\w/\w{3}/\w{1,13}/#"

Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. 请注意,我是如何将正则表达式定界符从“ /”更改为“#”的,因此我不必在其中转义斜杠。 You can use almost any punctuation character for the delimiters (but of course they both have to be the same). 您几乎可以将任何标点符号都用作分隔符(但当然它们必须相同)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM