简体   繁体   English

正则表达式和Openrefine中的值匹配

[英]Value matching in regex and Openrefine

I am trying to use the value.match command in OpenRefine 2.6 for splitting two columns based on a 4 number date. 我正在尝试使用OpenRefine 2.6中的value.match命令基于4号日期拆分两列。

A sample of the text is: 文本示例是:

"first sentence, second sentence, third sentences, 2009" “第一句话,第二句话,第三句话,2009”

What I do is going to "Add column based on this column" and insert 我要做的是“基于此列添加列”并插入

value.match(\\d{4}) value.match(\\ d {4})

but I get the error 但我得到了错误

Parsing error at offset 12: Missing number, string, identifier, regex, or parenthesized expression 偏移12处的解析错误:缺少数字,字符串,标识符,正则表达式或括号表达式

any idea of the possible solution? 对可能的解决方案有什么想法?

You need to fix 3 things to get this working: 您需要修复3件事才能使此工作:

1) As Wiktor says you need to start & end the regular expression with a forward slash / 1)正如Wiktor所说,您需要以正斜杠开始和结束正则表达式/

2) The 'match' function requires you to match the whole string in the cell, not just the fragment you need - so your regular expression needs to match the whole string 2)'match'函数要求您匹配单元格中的整个字符串,而不仅仅是您需要的片段-因此您的正则表达式需要匹配整个字符串

3) To extract part of a string with 'match' you need to have capture groups in your regular expression- that is use ( ) around the bit of the regular expression you want to extract. 3)要使用“匹配”提取字符串的一部分,您需要在正则表达式中具有捕获组-即在要提取的正则表达式的位周围使用()。 The captured values will be put in an array and you will need to get the string out of tge array to store it in a cell 捕获的值将放入数组中,您需要将字符串从tge数组中取出以将其存储在单元格中

So you'll need something like: 因此,您将需要以下内容:

value.match(/.*(\d{4})/)[0]

To get the four digit year from the end of the string 从字符串末尾获取四位数的年份

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM