[英]How to identify where each person have lived in different cities in each time?
Here is a small set of the dataset that I am currently working on. 这是我当前正在处理的一小部分数据集。
FirstName LastName cities occupation time
---------------------------------------------------------------
---------------------------------------------------------------
Alice Oumi Queens software engineer 1/1/2019
Alice Oumi New York software engineer 12/3/2018
Sam Charles Santa Clara Engineer 2/5/2017
Sam Charles Santa Monica Engineer 8/9/2018
Sam Charles Santa Clara Engineer 12/12/2019
Alice Oumi New York software engineer 1/2/2017
As you see above, the same person could be living in a same place but for a different duration of a time. 如您在上面看到的,同一个人可能生活在同一地方,但时间不同。 I want to make clean this dataset that should what places did Alice and Sam live.
我想整理一下该数据集,以了解爱丽丝和萨姆住过哪些地方。 For example, instead of having 2 rows of Alice living in New York, I only need to have one.
例如,我不需要在纽约有2行爱丽丝居住,而是只需要有一行。 Something similar to the following table
类似于下表
FirstName LastName cities FirstTime SecondTime
---------------------------------------------------------------
---------------------------------------------------------------
Alice Oumi Queens 1/1/2019 NA
Alice Oumi New York 1/2/2017 12/3/2018
Sam Charles Santa Clara 2/5/2017 12/12/2019
Sam Charles Santa Monica 8/9/2018 NA
I am kinda new to python and trying to learn. 我是python的新手,正在尝试学习。 but i have tried to use for loops using iterrows() but didn't work.
但是我试图使用iterrows()进行循环,但是没有用。 What can use to achieve this table?
有什么可以用来实现此表的?
Thank you so much in advance 提前谢谢你
You can do that as follows: 您可以按照以下步骤进行操作:
# number the times a person lived in the same city (with the same occupation)
df['sequence']= df.groupby(['FirstName', 'LastName', 'cities', 'occupation']).cumcount()+1
# now create the "pivot" table
result= df.set_index(['FirstName', 'LastName', 'cities', 'occupation', 'sequence']).unstack()
# rename the columns
result.columns= ['FirstTime', 'SecondTime']
# reset the index (it was just needed for "pivoting"
result.reset_index(inplace=True)
The result looks like: 结果看起来像:
Out[483]:
FirstName LastName cities occupation FirstTime SecondTime
0 Alice Oumi New York software engineer 12/3/2018 1/2/2017
1 Alice Oumi Queens software engineer 1/1/2019 NaN
2 Sam Charles Santa Clara Engineer 2/5/2017 12/12/2019
3 Sam Charles Santa Monica Engineer 8/9/2018 None NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.