[英]SAS set datasets to Python
I have 2 sas datasets and I'm using SET statement to combine both of them into a new one by a key column.我有 2 个 sas 数据集,我正在使用 SET 语句通过键列将它们组合成一个新的数据集。 Here is how the data looks (note: '.' is a missing or null value)这是数据的外观(注意:'.' 是缺失值或 null 值)
data1 data2
ID X Y ID X Z
01 12 11 01 . 4
02 15 . 03 17 6
03 18 .
data combine;
set data1 data2;
by id;
run;
Result of the output dataset 'combine' is as below: output 数据集“组合”的结果如下:
combine
ID X Y Z
01 12 11 .
01 . . 4
02 15 . .
03 17 . 6
03 18 . .
Can anyone let me know how do I do this in Pandas/Python.谁能让我知道如何在 Pandas/Python 中执行此操作。 I tried using pd.concat() but that's not giving the above desired output.我尝试使用 pd.concat() 但这并没有提供上述所需的 output。 Appreciate if anybody can help.感谢是否有人可以提供帮助。
concat
pd.concat([data1, data2], ignore_index=True).sort_values('ID')
ID X Y Z
0 01 12.0 11.0 NaN
2 01 NaN NaN 4.0
1 02 15.0 NaN NaN
3 03 17.0 NaN 6.0
4 03 18.0 NaN NaN
append
data1.append(data2, ignore_index=True).sort_values('ID')
ID X Y Z
0 01 12.0 11.0 NaN
2 01 NaN NaN 4.0
1 02 15.0 NaN NaN
3 03 17.0 NaN 6.0
4 03 18.0 NaN NaN
Sorry...basically, all I need is just a 'flag' to indicate the data coming from 'data1' and data2.抱歉...基本上,我只需要一个“标志”来指示来自“data1”和data2的数据。 in simple words, creating a new variable called 'flag' and assign it to 1 if the data is coming from data1 and assign value 2 for the data coming from data2.简单来说,创建一个名为“flag”的新变量,如果数据来自 data1,则将其赋值为 1,并为来自 data2 的数据赋值 2。 hope it's clear...Thanks again!希望它很清楚......再次感谢! – user11580242 – 用户11580242
You can use concat
你可以使用concat
pd.concat({1: data1, 2: data2}, names=['flag']).reset_index('flag').sort_values('ID')
flag ID X Y Z
0 1 01 12.0 11.0 NaN
0 2 01 NaN NaN 4.0
1 1 02 15.0 NaN NaN
1 2 03 17.0 NaN 6.0
2 2 03 18.0 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.