将大字符串拆分为子字符串

Question

I have a huge string like我有一个巨大的字符串像

ABCDEFGHIJKLM...

and I would like to split it into substrings of length 5 in this way:我想以这种方式将其拆分为长度为 5 的子字符串：

>1
ABCDE
>2
BCDEF
>3
CDEFG
[...]

Answer 1

${string:position:length}

Extracts $length characters of substring from $string at $position. 从$ position的$ string中提取子字符串的$ length字符。

stringZ=abcABC123ABCabc
#       0123456789.....
#       0-based indexing.

echo ${stringZ:0}                            # abcABC123ABCabc
echo ${stringZ:1}                            # bcABC123ABCabc
echo ${stringZ:7}                            # 23ABCabc

echo ${stringZ:0:5}                          # abcAB
                                             # Five characters of substring.

Then use a loop to go through and add 1 to the position to extract each substring of length 5. 然后使用循环遍历并向该位置添加1以提取长度为5的每个子字符串。

for i in seq 0 ${#stringZ}; do
    echo ${stringZ:$i:5}
done

All from Bash string manipulation 全部来自Bash字符串操作

Answer 2

sed can do it in one shot: sed可以一次性完成：

kent$  echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1 /g'
abcde fghij klmno pqr

or 要么

depends on your needs: 取决于您的需求：

kent$  echo "abcdefghijklmnopqr"|sed -r 's/(.{5})/\1\n/g' 
abcde
fghij
klmno
pqr

update 更新

i thought it was just simply split string problem, didn't read the question very carefully. 我以为这只是简单的分裂字符串问题，没有仔细阅读问题。 Now it should give what you need: 现在它应该给你需要的东西：

still one shot, but with awk this time: 还是一枪，但这次有awk：

kent$  echo "abcdefghijklmnopqr"|awk '{while(length($0)>=5){print substr($0,1,5);gsub(/^./,"")}}'

abcde
bcdef
cdefg
defgh
efghi
fghij
ghijk
hijkl
ijklm
jklmn
klmno
lmnop
mnopq
nopqr

Answer 3

In bash: 在bash中：

s=ABCDEFGHIJ
for (( i=0; i < ${#s}-4; i++ )); do 
  printf ">%d\n%s\n" $((i+1)) ${s:$i:5}
done

outputs 输出

>1
ABCDE
>2
BCDEF
>3
CDEFG
>4
DEFGH
>5
EFGHI
>6
FGHIJ

Answer 4

str=ABCDEFGHIJKLM
splitfive(){ echo "${1:$2:5}" ; }
for (( i=0 ; i < ${#str} ; i++ )) ; do splitfive "$str" $i ; done

Or, perhaps you want to do something more intelligent with the results 或者，也许你想对结果做一些更聪明的事情

#!/usr/bin/env bash

splitstr(){
    printf '%s\n' "${1:$2:$3}"
}

n=$1
offset=$2

declare -a by_fives

while IFS= read -r str ; do
    for (( i=0 ; i < ${#str} ; i++ )) ; do
            by_fives=("${by_fives[@]}" "$(splitstr "$str" $i $n)")
    done
done

echo ${by_fives[$offset]}

And then call it 然后打电话给它

$ split-by 5 2 <<<"ABCDEFGHIJKLM"
CDEFG

You can adapt it from there. 你可以从那里调整它。

EDIT: trivial version in C, for performance comparison: 编辑：C中的普通版本，用于性能比较：

#include <stdio.h>

int main(void){
    FILE* f;
    int n=0;
    char five[6];

    five[5] = '\0';

    f = fopen("inputfile", "r");

    if(f!=0){
            fread(&five, sizeof(char), 5, f);
            while(!feof(f)){
                    printf("%s\n", five);
                    fseek(f, ++n, SEEK_SET);

                    fread(&five, sizeof(char), 5, f);
            }
    }

    return 0;
}

Forgive my bad C, I really don't knw the language. 原谅我的坏C，我真的不懂语言。

Answer 5

sed会这样做吗？：

$ sed 's/\(.....\)/\1\n/g' < filecontaininghugestring

Answer 6

...or use the split command: ...或使用split命令：

$ ls

$ echo "abcdefghijklmnopqr" | split -b5

$ ls
xaa  xab  xac  xad

$ cat xaa
abcde

split also operates on files... split也对文件进行操作......

Answer 7

sed can do it: sed可以做到：

 sed -nr ':a;h;s/(.{5}).*/\1/p;g;s/.//;ta;' <<<"ABCDEFGHIJKLM" | # split string
     sed '=' | sed '1~2s/^/>/' # add line numbers and insert '>'

Answer 8

You could use cut and specify characters instead of fields , and then change output delimiter to whatever you need, like new line : 您可以使用cut并指定characters而不是fields ，然后将输出分隔符更改为您需要的任何内容，例如新行 ：

echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$'\n' -c1-5,6-10,11-15

output 产量

ABCDE
FGHIJ
KLMNO

or 要么

echo "ABCDEFGHIJKLMNOP" | cut --output-delimiter=$':' -c1-5,6-10,11-15

output 产量

ABCDE:FGHIJ:KLMNO

Answer 9

fold -w5 should do the trick. fold -w5应该可以解决问题。

$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | fold -w5
ABCDE
FGHIJ
KLMNO
PQRST
UVWXY
Z

Cheers! 干杯!

Answer 10

thanks to you guys I was able to find a way to do this fast!感谢你们，我能够找到一种快速做到这一点的方法！ This is my solution combining a few ideas from here:这是我的解决方案，结合了这里的一些想法：

str="ABCDEFGHIJKLMNOP"   
splitfive(){
    echo $1 | cut -c $2- | sed -r 's/(.{5})/\1\n/g'
}  
for (( i=0; i <= 5; i++ )); do
    splitfive "$str" $i
done | grep -v "^$"

[The above answer was initially added to the question itself. [上述答案最初是添加到问题本身中的。 Here are the relevant comments.]以下是相关评论。]

Your splitfive could be more efficient.您的splitfive可能会更有效率。 There's no need to pipe to cut, in bash you could say cut -c "$2"- <<<"$1" | sed不需要管道切割，在 bash 中你可以说cut -c "$2"- <<<"$1" | sed cut -c "$2"- <<<"$1" | sed etc and it will be slightly better. cut -c "$2"- <<<"$1" | sed等，它会稍微好一点。 -- sorpigal Sep 28 '11 at 11:48 -- sorpigal 2011 年9 月 28 日 11:48

Your sed expression could also be improved to sed 's/...../&\\n/g' which executes about twice as fast.您的 sed 表达式也可以改进为sed 's/...../&\\n/g' ，它的执行速度大约是其两倍。 -- sorpigal Sep 28 '11 at 11:56 -- sorpigal 2011 年9 月 28 日 11:56

将大字符串拆分为子字符串

问题描述

10 个解决方案

解决方案1
17 2011-09-27 11:15:11

解决方案2
9 2011-09-27 11:56:35

解决方案3
2 2011-09-27 13:30:31

解决方案4
1 2011-09-27 11:16:38

解决方案5
1 2011-09-27 12:00:53

解决方案6
1 2011-09-27 12:05:56

解决方案7
1 2011-09-27 16:25:55

解决方案8
0 2013-10-30 05:59:25

解决方案9
0 2018-11-07 22:02:33

解决方案10
0

将大字符串拆分为子字符串

问题描述

10 个解决方案

解决方案1 17 2011-09-27 11:15:11

解决方案2 9 2011-09-27 11:56:35

解决方案3 2 2011-09-27 13:30:31

解决方案4 1 2011-09-27 11:16:38

解决方案5 1 2011-09-27 12:00:53

解决方案6 1 2011-09-27 12:05:56

解决方案7 1 2011-09-27 16:25:55

解决方案8 0 2013-10-30 05:59:25

解决方案9 0 2018-11-07 22:02:33

解决方案10 0

解决方案1
17 2011-09-27 11:15:11

解决方案2
9 2011-09-27 11:56:35

解决方案3
2 2011-09-27 13:30:31

解决方案4
1 2011-09-27 11:16:38

解决方案5
1 2011-09-27 12:00:53

解决方案6
1 2011-09-27 12:05:56

解决方案7
1 2011-09-27 16:25:55

解决方案8
0 2013-10-30 05:59:25

解决方案9
0 2018-11-07 22:02:33

解决方案10
0