如何在 awk 中将 camelCase 字符串拆分为数组？

Question

How can I split a camelCase string into an array in awk using the split function?如何使用拆分 function 将 camelCase 字符串拆分为 awk 中的数组？

Input:输入：

STRING="camelCasedExample"

Desired Result:期望的结果：

WORDS[1]="camel"
WORDS[2]="Cased"
WORDS[3]="Example"

Bad Attempt:错误尝试：

split(STRING, WORDS, /([a-z])([A-Z])/);

Bad Result:坏结果：

WORDS[1]="came"
WORDS[2]="ase"
WORDS[3]="xample"

Answer 1

You can't do it with split() alone which is why GNU awk has patsplit() :你不能单独使用split()来做到这一点，这就是为什么 GNU awk 有patsplit() ：

$ awk 'BEGIN {
    patsplit("camelCasedExample",words,/(^|[[:upper:]])[[:lower:]]+/)
    for ( i in words ) print words[i]
}'
camel
Cased
Example

Answer 2

With your shown samples, please try following.使用您显示的示例，请尝试以下操作。 Written and tested in GNU awk should work in any awk .在 GNU awk中编写和测试应该可以在任何awk中工作。 This will create array named words whose values could be accessed from index starting 1,2,3 and so on.这将创建名为words的数组，其值可以从索引 1、2、3 等开始访问。 I am printing it as an output, you can make use of it later on as per your wish too.我将其打印为 output，您以后也可以根据自己的意愿使用它。

awk -F'=|"' -v s1="\"" '
{
  gsub(/[A-Z]/,"\n&",$3)
  val=(val?val ORS:"")$3
}
END{
  num=split(val,words,ORS)
  for(i=1;i<=num;i++){
    if(words[i]!=""){
      print "WORDS[" ++count "]=" s1 words[i] s1
    }
  }
}
' Input_file

Explanation: Adding detailed explanation for above awk code.说明：对上述awk代码添加详细说明。

awk -F'=|"' -v s1="\"" '                     ##Starting awk program, setting field separator as = OR " and setting s1 to " here.
{
  gsub(/[A-Z]/,"\n&",$3)                     ##Using gsub to globally substitute captial letter with new character and value itself in 3rd field.
  val=(val?val ORS:"") $3                    ##Creating val which has $3 in it and keep adding values in val itself.
}
END{                                         ##Starting END block of this program from here.
  num=split(val,words,ORS)                     ##Splitting val into array arr with delmiter of ORS.
  for(i=1;i<=num;i++){                       ##Running for loop from value of 1 to till num here.
    if(words[i]!=""){                          ##Checking if arr item is NOT NULL then do following.
       print "WORDS[" ++count "]=" s1 words[i] s1    ##Printing WORDS[ value of i followed by ]= followed by s1 words[i] value and s1.
    }
  }
}
'  Input_file                                ##Mentioning Input_file name here.

Answer 3

Here is an awk solution that would work with any version of awk :这是一个awk解决方案，适用于任何版本的awk ：

s='camelCasedExample'
awk '{
   while (match($0, /(^|[[:upper:]])[[:lower:]]+/)) {
      wrd = substr($0,RSTART,RLENGTH)
      print wrd
      # you can also store it in array
      arr[++n] = wrd
      $0 = substr($0,RSTART+RLENGTH)
   }
}' <<< "$s"

camel
Cased
Example

Answer 4

 echo 'camelCasedExample' | mawk '{ for (_=(____=split($((_=_<_) * gsub("[>-[]", (___)"&")), __, ___) )^_; _<=____; _++) { print "","__["(_)"]",__[_] } }' OFS=':: ' FS='^$' ___='\20\22'

 :: __[1] :: camel
 :: __[2] :: Cased
 :: __[3] :: Example

如何在 awk 中将 camelCase 字符串拆分为数组？

问题描述

4 个解决方案

解决方案1
3 已采纳 2022-08-03 18:04:13

解决方案2
1 2022-08-03 17:38:06

解决方案3
1 2022-08-03 18:28:05

解决方案4
0 2022-08-05 10:48:49

如何在 awk 中将 camelCase 字符串拆分为数组？

问题描述

4 个解决方案

解决方案1 3 已采纳 2022-08-03 18:04:13

解决方案2 1 2022-08-03 17:38:06

解决方案3 1 2022-08-03 18:28:05

解决方案4 0 2022-08-05 10:48:49

解决方案1
3 已采纳 2022-08-03 18:04:13

解决方案2
1 2022-08-03 17:38:06

解决方案3
1 2022-08-03 18:28:05

解决方案4
0 2022-08-05 10:48:49