[英]Split large string into substrings
I have a huge string like我有一个巨大的字符串像
ABCDEFGHIJKLM...
and I would like to split it into substrings of length 5 in this way:我想以这种方式将其拆分为长度为 5 的子字符串:
>1
ABCDE
>2
BCDEF
>3
CDEFG
[...]
${string:position:length}
Extracts $length characters of substring from $string at $position. 从$ position的$ string中提取子字符串的$ length字符。
stringZ=abcABC123ABCabc
# 0123456789.....
# 0-based indexing.
echo ${stringZ:0} # abcABC123ABCabc
echo ${stringZ:1} # bcABC123ABCabc
echo ${stringZ:7} # 23ABCabc
echo ${stringZ:0:5} # abcAB
# Five characters of substring.
Then use a loop to go through and add 1 to the position to extract each substring of length 5. 然后使用循环遍历并向该位置添加1以提取长度为5的每个子字符串。
for i in seq 0 ${#stringZ}; do
echo ${stringZ:$i:5}
done
All from Bash string manipulation 全部来自Bash字符串操作
sed can do it in one shot: sed可以一次性完成:
kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1 /g'
abcde fghij klmno pqr
or 要么
depends on your needs: 取决于您的需求:
kent$ echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1\n/g'
abcde
fghij
klmno
pqr
update 更新
i thought it was just simply split string problem, didn't read the question very carefully. 我以为这只是简单的分裂字符串问题,没有仔细阅读问题。 Now it should give what you need:
现在它应该给你需要的东西:
still one shot, but with awk this time: 还是一枪,但这次有awk:
kent$ echo "abcdefghijklmnopqr"|awk '{while(length($0)>=5){print substr($0,1,5);gsub(/^./,"")}}'
abcde
bcdef
cdefg
defgh
efghi
fghij
ghijk
hijkl
ijklm
jklmn
klmno
lmnop
mnopq
nopqr
In bash: 在bash中:
s=ABCDEFGHIJ
for (( i=0; i < ${#s}-4; i++ )); do
printf ">%d\n%s\n" $((i+1)) ${s:$i:5}
done
outputs 输出
>1
ABCDE
>2
BCDEF
>3
CDEFG
>4
DEFGH
>5
EFGHI
>6
FGHIJ
str=ABCDEFGHIJKLM
splitfive(){ echo "${1:$2:5}" ; }
for (( i=0 ; i < ${#str} ; i++ )) ; do splitfive "$str" $i ; done
Or, perhaps you want to do something more intelligent with the results 或者,也许你想对结果做一些更聪明的事情
#!/usr/bin/env bash
splitstr(){
printf '%s\n' "${1:$2:$3}"
}
n=$1
offset=$2
declare -a by_fives
while IFS= read -r str ; do
for (( i=0 ; i < ${#str} ; i++ )) ; do
by_fives=("${by_fives[@]}" "$(splitstr "$str" $i $n)")
done
done
echo ${by_fives[$offset]}
And then call it 然后打电话给它
$ split-by 5 2 <<<"ABCDEFGHIJKLM"
CDEFG
You can adapt it from there. 你可以从那里调整它。
EDIT: trivial version in C, for performance comparison: 编辑:C中的普通版本,用于性能比较:
#include <stdio.h>
int main(void){
FILE* f;
int n=0;
char five[6];
five[5] = '\0';
f = fopen("inputfile", "r");
if(f!=0){
fread(&five, sizeof(char), 5, f);
while(!feof(f)){
printf("%s\n", five);
fseek(f, ++n, SEEK_SET);
fread(&five, sizeof(char), 5, f);
}
}
return 0;
}
Forgive my bad C, I really don't knw the language. 原谅我的坏C,我真的不懂语言。
sed会这样做吗?:
$ sed 's/\(.....\)/\1\n/g' < filecontaininghugestring
...or use the split
command: ...或使用
split
命令:
$ ls
$ echo "abcdefghijklmnopqr" | split -b5
$ ls
xaa xab xac xad
$ cat xaa
abcde
split
also operates on files... split
也对文件进行操作......
sed can do it: sed可以做到:
sed -nr ':a;h;s/(.{5}).*/\1/p;g;s/.//;ta;' <<<"ABCDEFGHIJKLM" | # split string
sed '=' | sed '1~2s/^/>/' # add line numbers and insert '>'
You could use cut
and specify characters
instead of fields
, and then change output delimiter to whatever you need, like new line : 您可以使用
cut
并指定characters
而不是fields
,然后将输出分隔符更改为您需要的任何内容,例如新行 :
echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$'\n' -c1-5,6-10,11-15
output 产量
ABCDE
FGHIJ
KLMNO
or 要么
echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$':' -c1-5,6-10,11-15
output 产量
ABCDE:FGHIJ:KLMNO
fold -w5
should do the trick. fold -w5
应该可以解决问题。
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | fold -w5
ABCDE
FGHIJ
KLMNO
PQRST
UVWXY
Z
Cheers! 干杯!
thanks to you guys I was able to find a way to do this fast!感谢你们,我能够找到一种快速做到这一点的方法! This is my solution combining a few ideas from here:
这是我的解决方案,结合了这里的一些想法:
str="ABCDEFGHIJKLMNOP"
splitfive(){
echo $1 | cut -c $2- | sed -r 's/(.{5})/\1\n/g'
}
for (( i=0; i <= 5; i++ )); do
splitfive "$str" $i
done | grep -v "^$"
[The above answer was initially added to the question itself. [上述答案最初是添加到问题本身中的。 Here are the relevant comments.]
以下是相关评论。]
Your
splitfive
could be more efficient.您的
splitfive
可能会更有效率。 There's no need to pipe to cut, in bash you could saycut -c "$2"- <<<"$1" | sed
不需要管道切割,在 bash 中你可以说
cut -c "$2"- <<<"$1" | sed
cut -c "$2"- <<<"$1" | sed
etc and it will be slightly better.cut -c "$2"- <<<"$1" | sed
等,它会稍微好一点。 -- sorpigal Sep 28 '11 at 11:48-- sorpigal 2011 年9 月 28 日 11:48
Your sed expression could also be improved to
sed 's/...../&\\n/g'
which executes about twice as fast.您的 sed 表达式也可以改进为
sed 's/...../&\\n/g'
,它的执行速度大约是其两倍。 -- sorpigal Sep 28 '11 at 11:56-- sorpigal 2011 年9 月 28 日 11:56
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.