简体   繁体   English

SAS数据步骤截断值

[英]SAS Data Step Truncating Values

I have a series of data sets that are being created by pulling information from a database using pro sql statements. 我有一系列数据集,这些数据集是通过使用pro sql语句从数据库中提取信息而创建的。 There is one field (Measure) that i am using a case statement to create a "definition" of sorts. 我正在使用一个case语句创建一个“定义”种类的字段(度量)。 I am then using a data step to merge these different data sets together. 然后,我使用数据步骤将这些不同的数据集合并在一起。 However this field is being truncated when the data set combines them (stacks them). 但是,当数据集将它们组合(堆叠)时,该字段将被截断。

example: the following, 'Portfolio Balance (w/ Eco-Charge Offs)' is being truncated to 'Portfolio Balance (w/ Eco-Charge Off'. Or 'Application Volume is being truncated to 'Application Volum'. 例如:以下“投资组合余额(带环保收费)”已被截断为“投资组合余额(带环保收费已关闭)”,或者“申请量已被截断为”申请量”。

Below is the data step statement. 以下是数据步骤语句。 I've tried using format and length to force the character number but it still truncates the values. 我尝试使用格式和长度来强制字符数,但它仍会截断值。 I also created a dummy data set'placeholders' which had values with 50 characters to try and make sure the longest value in the datasets was at the beginning but that hasn't helped either. 我还创建了一个虚拟数据集“占位符”,其值包含50个字符,以尝试确保数据集中最长的值是在开始处,但这也没有帮助。

DATA Data.COMBINED;

format measure  $45.;
SET
Data.PLACEHOLDERS 
Data.GSK
DATA.SSS
DATA.MF
DATA.SRT

;
RUN;

Again, if I look at the returned results for all the data returned in the proc sql statments, the full values are shown. 同样,如果我查看proc sql语句中返回的所有数据的返回结果,则会显示完整值。 It's only when i try to merge/stack them together in the data set they start truncating. 只有当我尝试将它们合并/堆叠到数据集中时,它们才会开始被截断。 Thoughts? 思考?

It would be best to modify the code that creates the original datasets so they are created in a standard structure. 最好修改创建原始数据集的代码,以便它们以标准结构创建。

There are two ways that combining two or more datasets can lead to truncation of character variables (or apparent truncation). 组合两个或多个数据集可以导致字符变量被截断(或表观截断)的两种方法。

The first is physical truncation because the variable is defined shorter in the data step than in one of the source datasets. 第一个是物理截断,因为在数据步骤中定义的变量比在一个源数据集中定义的变量短。 SAS will define the variable the first time it sees it. SAS会在第一次看到变量时对其进行定义。 So if the first dataset has MEASURE with a length of $20 then that is how it is defined. 因此,如果第一个数据集的MEASURE的长度为$ 20,则这就是定义的方式。 The solution to that is similar to your attempt, only you should use either a LENGTH or an ATTRIB statement to explicitly define your variable lengths instead of forcing SAS to guess how you want to define the variable based on the fact that the variable first appears in a FORMAT statement. 解决该问题的方法与您的尝试类似,只是您应该使用LENGTHATTRIB语句来显式定义变量长度,而不是强迫SAS根据变量首次出现在以下事实来猜测您想如何定义变量: FORMAT语句。

The second truncation could be just in how the values are displayed. 第二个截断可能只是值的显示方式。 If you have a attached a format with a width that is shorter than the variables length then the values will appear truncated in output, even when they are not really truncated. 如果您附加了一个宽度小于变量length的格式,那么即使输出中的值未真正被截断,它们也会在输出中被截断。 This is especially likely when generating dataset by pulling from external databases because PROC SQL will automatically assign a format that matches the length of the variable. 当通过从外部数据库提取生成数据集时,这尤其可能发生,因为PROC SQL会自动分配与变量长度匹配的格式。 For character variables the easiest solution to this is to just remove those formats from character variables. 对于字符变量,最简单的解决方案是从字符变量中删除那些格式。 SAS doesn't need them to know how to display the values. SAS不需要他们知道如何显示值。

data combined;
  length var1 $40 var2 $20 ;
  set gks mf ;
  format _character_ ;
run;

Actually PROC SQL is pretty good at resolving length issues on its one. 实际上,PROC SQL非常擅长解决其长度问题。 It might be easier to combine the datasets that way. 这样合并数据集可能会更容易。

proc sql;
create table combined as
  select * from gks
  union corr all
  select * from mf 
;
quit;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM