[英]How to fill in values for a column in a dataframe by matching values from another dataframe pandas
I'm new to python and am working with the kaggle titanic dataset to practice.我是 Python 新手,正在使用 kaggle 泰坦尼克号数据集进行练习。
I'm trying to fill in a couple missing values for the cabin feature by using rows that have the same tickets.我正在尝试通过使用具有相同票证的行来填充客舱功能的几个缺失值。 That is, I want to get a list of duplicate tickets and their corresponding cabin value and replace the null values with the cabin values corresponding to the same ticket.
也就是说,我想获取重复机票及其相应舱位值的列表,并将空值替换为与同一张票对应的舱位值。
In my approach, I created a dataframe with the following code consisting of only one occurrence of the duplicate ticket(given that the ticket had a cabin value to go along with it; is non-null) to assign it a single cabin value.在我的方法中,我使用以下代码创建了一个数据框,其中仅包含一次重复机票(假设机票有一个客舱值与之配套;非空),以为其分配一个客舱值。 This way I could fill in the cabin values in the training set(maindf) by matching.
这样我就可以通过匹配来填充训练集(maindf)中的客舱值。
ticket_dupl = maindf[(maindf.duplicated('Ticket')) & (maindf['Cabin'].notnull())][['Ticket','Cabin']].drop_duplicates('Ticket')
This gives me a dataframe of length 50 with index perserved, heres the first 7 rows:这给了我一个长度为 50 的数据帧,并保留了索引,这是前 7 行:
Ticket Cabin
88 19950 C23 C25 C27
124 35281 D26
137 113803 C123
193 230080 F2
195 PC 17569 B80
230 36973 C83
251 347054 G6
Is there a way to fill in some cabin values in my maindf by matching ticket rows or indices, preserving the values for which tickets don't match?有没有办法通过匹配票行或索引来填充我的 maindf 中的一些客舱值,保留不匹配的票的值? Can't seem to understand from other solutions for questions similar to mine.
对于与我类似的问题,似乎无法从其他解决方案中理解。
Also, I was wondering if there was a more efficient way of achieving my goal instead of creating a dataframe like I did.另外,我想知道是否有更有效的方法来实现我的目标,而不是像我那样创建数据框。 Thanks.
谢谢。
您可以按故障单分组以将具有匹配故障单的行组合在一起,并使用返回组中第一个非空值的 first_valid_index 填充空值。
df['Cabin'] = df.groupby('Ticket')['Cabin'].transform(lambda x: x.loc[x.first_valid_index()])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.