简体   繁体   English

delphi - 从字符串中删除所有非标准文本characers

[英]delphi - strip out all non standard text characers from string

I need to strip out all non standard text characers from a string. 我需要从字符串中删除所有非标准文本characers。 I need remove all non ascii and control characters (except line feeds/carriage returns). 我需要删除所有非ascii和控制字符(换行/回车除外)。

And here's a variant of Cosmin's that only walks the string once, but uses an efficient allocation pattern: 而这里是Cosmin的一个变体,它只使用一次字符串,但使用了一种有效的分配模式:

function StrippedOfNonAscii(const s: string): string;
var
  i, Count: Integer;
begin
  SetLength(Result, Length(s));
  Count := 0;
  for i := 1 to Length(s) do begin
    if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then begin
      inc(Count);
      Result[Count] := s[i];
    end;
  end;
  SetLength(Result, Count);
end;

Something like this should do: 这样的事情应该做:

// For those who need a disclaimer: 
// This code is meant as a sample to show you how the basic check for non-ASCII characters goes
// It will give low performance with long strings that are called often.
// Use a TStringBuilder, or SetLength & Integer loop index to optimize.
// If you need really optimized code, pass this on to the FastCode people.
function StripNonAsciiExceptCRLF(const Value: AnsiString): AnsiString;
var
  AnsiCh: AnsiChar;
begin
  for AnsiCh in Value do
    if (AnsiCh >= #32) and (AnsiCh <= #127) and (AnsiCh <> #13) and (AnsiCh <> #10) then
      Result := Result + AnsiCh;
end;

For UnicodeString you can do something similar. 对于UnicodeString您可以执行类似的操作。

if you don't need to do it in-place, but generating a copy of the string, try this code 如果您不需要就地执行此操作,但生成该字符串的副本,请尝试此代码

 type CharSet=Set of Char;

 function StripCharsInSet(s:string; c:CharSet):string;
  var i:Integer;
  begin
     result:='';
     for i:=1 to Length(s) do
       if not (s[i] in c) then 
         result:=result+s[i];
  end;  

and use it like this 并像这样使用它

 s := StripCharsInSet(s,[#0..#9,#11,#12,#14..#31,#127]);

EDIT : added #127 for DEL ctrl char. 编辑 :为DEL ctrl char添加了#127。

EDIT2 : this is a faster version, thanks ldsandon EDIT2 :这是一个更快的版本,感谢ldsandon

 function StripCharsInSet(s:string; c:CharSet):string;
  var i,j:Integer;
  begin
     SetLength(result,Length(s));
     j:=0;
     for i:=1 to Length(s) do
       if not (s[i] in c) then 
        begin
         inc(j);
         result[j]:=s[i];
        end;
     SetLength(result,j);
  end;  

Here's a version that doesn't build the string by appending char-by-char, but allocates the whole string in one go. 这是一个不通过附加char-by-char来构建字符串的版本,但是一次性分配整个字符串。 It requires going over the string twice, once to count the "good" char, once to effectively copy those chars, but it's worth it because it doesn't do multiple reallocations: 它需要遍历字符串两次,一次计算“好”字符,一次有效地复制这些字符,但这是值得的,因为它不会进行多次重新分配:

function StripNonAscii(s:string):string;
var Count, i:Integer;
begin
  Count := 0;
  for i:=1 to Length(s) do
    if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
      Inc(Count);
  if Count = Length(s) then
    Result := s // No characters need to be removed, return the original string (no mem allocation!)
  else
    begin
      SetLength(Result, Count);
      Count := 1;
      for i:=1 to Length(s) do
        if ((s[i] >= #32) and (s[i] <= #127)) or (s[i] in [#10, #13]) then
        begin
          Result[Count] := s[i];
          Inc(Count);
        end;
    end;
end;

my performance solution; 我的绩效解决方案

function StripNonAnsiChars(const AStr: String; const AIgnoreChars: TSysCharSet): string;
var
  lBuilder: TStringBuilder;
  I: Integer;
begin
  lBuilder := TStringBuilder.Create;
  try
    for I := 1 to AStr.Length do
      if CharInSet(AStr[I], [#32..#127] + AIgnoreChars) then
        lBuilder.Append(AStr[I]);
    Result := lBuilder.ToString;
  finally
    FreeAndNil(lBuilder);
  end;
end;

I wrote by delphi xe7 我是用delphi xe7写的

my version with Result array of byte : 我的版本结果数组为byte:

interface 接口

type
  TSBox = array of byte;

and the function : 和功能:

function StripNonAscii(buf: array of byte): TSBox;
var temp: TSBox;
    countr, countr2: integer;
const validchars : TSysCharSet = [#32..#127];
begin
if Length(buf) = 0 then exit;
countr2:= 0;
SetLength(temp, Length(buf)); //setze temp auf länge buff
for countr := 0 to Length(buf) do if CharInSet(chr(buf[countr]), validchars) then
  begin
    temp[countr2] := buf[countr];
    inc(countr2); //count valid chars
  end;
SetLength(temp, countr2);
Result := temp;
end;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM