[英]SAS or Postgresql:add column value according to another column value
我想添加一列“索引”,並且當“變量”列中的行具有相同的值時,“索引”列的值相同。您可以使用Postgresql或SAS語法。
一件事是“變量”列中的值每天都在變化,例如tableA和tableB,所以硬代碼是不可接受的。任何建議都值得贊賞!
+----------+--------------+------+-------+-----+-----+-------+
| variable | new_variable | type | start | end | woe | index |
+----------+--------------+------+-------+-----+-----+-------+
| A | mi_A | char | 1 | | 1.3 | 1 |
| A | mi_A | char | 0 | | 0.6 | 1 |
| B | mi_B | char | 1 | | 5.4 | 2 |
| B | mi_B | char | 0 | | 0.1 | 2 |
| gnd_cd | gnd_cd | char | 3 | | 1.3 | 3 |
| gnd_cd | gnd_cd | char | @0 | | 0.6 | 3 |
| gnd_cd | gnd_cd | char | 2 | | 5.4 | 3 |
| gnd_cd | gnd_cd | char | N | | 0.1 | 3 |
| gnd_cd | gnd_cd | char | 1 | | 1.3 | 3 |
| gnd_cd | gnd_cd | char | 99 | | 0.6 | 3 |
| mar_sign | mar_sign | char | 0 | | 5.4 | 4 |
| mar_sign | mar_sign | char | Y | | 0.1 | 4 |
| mar_sign | mar_sign | char | N | | 6 | 4 |
| C | C | char | 6 | | 2 | 5 |
| C | C | char | 7 | | 2.1 | 5 |
| C | C | char | 8 | | 2.2 | 5 |
+----------+--------------+------+-------+-----+-----+-------+
(tableA)
+--------------+--------------+------+-------+-----+-----+-------+
| variable | new_variable | type | start | end | woe | index |
+--------------+--------------+------+-------+-----+-----+-------+
| D | mi_D | char | 1 | | 1 | 1 |
| D | mi_D | char | 0 | | 2 | 1 |
| E | mi_E | char | 1 | | 2 | 2 |
| E | mi_E | char | 0 | | 3 | 2 |
| education_bg | education_bg | char | 3 | | 1 | 3 |
| education_bg | education_bg | char | @0 | | 5 | 3 |
| education_bg | education_bg | char | 2 | | 6 | 3 |
| education_bg | education_bg | char | N | | 4 | 3 |
| education_bg | education_bg | char | 1 | | 3 | 3 |
| education_bg | education_bg | char | 99 | | 3 | 3 |
| sex | sex | char | 0 | | 2 | 4 |
| sex | sex | char | Y | | 1 | 4 |
| sex | sex | char | N | | 0 | 4 |
| C | C | char | 6 | | 6 | 5 |
| C | C | char | 7 | | 4 | 5 |
| C | C | char | 8 | | 1 | 5 |
+--------------+--------------+------+-------+-----+-----+-------+
(tableB)
您可以使用“保留功能”並按變量分組,在單個數據步驟中在SAS中執行此操作。
碼:
data have;
infile datalines dlm='|';
input variable $ new_variable $ type $ start $ end $ woe ;
datalines;
| A | mi_A | char | 1 | | 1.3
| A | mi_A | char | 0 | | 0.6
| B | mi_B | char | 1 | | 5.4
| B | mi_B | char | 0 | | 0.1
| gnd_cd | gnd_cd | char | 3 | | 1.3
| gnd_cd | gnd_cd | char | @0 | | 0.6
| gnd_cd | gnd_cd | char | 2 | | 5.4
| gnd_cd | gnd_cd | char | N | | 0.1
| gnd_cd | gnd_cd | char | 1 | | 1.3
| gnd_cd | gnd_cd | char | 99 | | 0.6
| mar_sign | mar_sign | char | 0 | | 5.4
| mar_sign | mar_sign | char | Y | | 0.1
| mar_sign | mar_sign | char | N | | 6
| C | C | char | 6 | | 2
| C | C | char | 7 | | 2.1
| C | C | char | 8 | | 2.2
run;
data want;
set have ;
by variable notsorted;
retain index;
if first.variable then index+1;
run;
注意:我創建了索引,並且僅使用新的組值來增加其值。
您可以創建一個類似於以下內容的新表:
select distinct variable, montonic() as newindex from mydata order by index;
然后將其重新連接到原始表。 實際上,您可以一步完成所有操作:
select a.variable, a.new_variable, a.type, a.start, a.end, a.woe, b.newindex as index from mydata as a left join (select distinct variable, montonic() as newindex from my table order by index) as b on a.variable=b.variable;
或類似的東西。 我不能說我100%理解您想要達到的目標,但這也許會有所幫助。
請注意,SAS中的monotonic()函數仍(我認為)尚未記錄。 這意味着SAS可能會也可能不會繼續包含它。 它可以工作,但是也許他們認為它是實驗性的。
考慮使用Postgres的dense_rank
窗口函數進行無間隙排名:
SELECT *, DENSE_RANK() OVER (ORDER BY variable) as "index"
FROM mytable
Rextester演示 (帶有隨機種子數據)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.