简体   繁体   English

如何遍历两个不同大小的数据帧?

[英]How do I iterate through two dataframes of different sizes?

Specifically I want to iterate through two dataframes, one being large and one being small.具体来说,我想遍历两个数据帧,一个大,一个小。

Ultimately, I would like to compare values within a certain column.最终,我想比较某个列中的值。

I tried creating a nested for loop;我尝试创建一个嵌套的 for 循环; the outer loop iterating through the large dataframe and the inner loop iterating through the small dataframe however I am having difficulties.外循环遍历大数据帧,内循环遍历小数据帧,但是我遇到了困难。

I'm looking for a way to identify that the "name" and "value" in my large dataframe that matches my small dataframe.我正在寻找一种方法来识别与我的小数据帧匹配的大数据帧中的“名称”和“值”。

Background info: I am using the panda library.背景信息:我正在使用熊猫库。

Large dataframe:大数据框:

在此处输入图片说明

Small dataframe:小数据框:

Name     Value
SF       12.84
TH      -49.45

If the goal is to iterate through one, or especially more, DataFrame s, then explicit for loops is usually the wrong move.如果目标是迭代一个或多个DataFrame ,那么显式for循环通常是错误的举动。 In this case, because you're trying to在这种情况下,因为你试图

identify that the "name" and "value" in my large dataframe that matches my small dataframe,确定与我的小数据帧匹配的大数据帧中的“名称”和“值”,

the operation that you're looking for is either pd.merge or pd.DataFrame.join which do the comparisons "under the hood" and return matching information.您正在寻找的操作是pd.mergepd.DataFrame.join ,它们在“ pd.DataFrame.join ”进行比较并返回匹配信息。 So, say you have the 2 DataFrame s and they're called large and small .因此,假设您有 2 个DataFrame ,它们被称为largesmall Then然后

import pandas as pd
new_large = pd.merge(left=large,
                     right=small,
                     how='left',
                     on=('Name', 'Value'),
                     indicator=True)

new_large._merge = new_large._merge.apply(lambda x: 1 if x=='both' else 0)

By doing a left join between large and small ( how='left' ), pd.merge returns the rows in large that contain a match in small on the ('Name', 'Value') tuple.通过在largesmall ( how='left' ) 之间进行左连接, pd.merge返回包含('Name', 'Value')元组上的small匹配项的large行。 Then, most of the heavy lifting is done by the indicator keyword that, quoting the pd.merge version 0.25.0 docs:然后,大部分繁重的工作由indicator关键字完成,引用pd.merge版文档:

If True, adds a column to output DataFrame called "_merge" with information on the source of each row.如果为 True,则向输出 DataFrame 添加一列名为“_merge”的列,其中包含有关每行源的信息。 Information column is Categorical-type and takes on a value of "left_only" for observations whose merge key only appears in 'left' DataFrame, "right_only" for observations whose merge key only appears in 'right' DataFrame, and "both" if the observation's merge key is found in both.信息列是 Categorical 类型,对于合并键仅出现在“左”数据帧中的观察值采用“left_only”值,对于合并键仅出现在“右”数据帧中的观察值采用“right_only”值,如果在两者中都可以找到观察的合并键。

So, new_large is the original large DataFrame with a new column called _merge the entries of which correspond to the rows of large that matched small just on Name (by the value 'left_only' ) and the rows that matched on Name as well as Value ;因此, new_large是原始的large DataFrame带有一个名为_merge的新列,其中的条目对应于仅在Name上匹配smalllarge行(通过值'left_only' )以及在NameValue上匹配的行; the latter having the value both .后者具有值both The last step is changing both and left_only to 1 and 0 , as you specified.最后一步是将bothleft_only更改为10 ,如您所指定。

Now, the left join returned what it did because both of the Name values in the small DataFrame were present in the large DataFrame so the left-join of large and small returned the whole large DataFrame .现在,左连接返回了它所做的,因为小DataFrame中的两个Name值都存在于large DataFrame所以largesmall的左连接返回了整个large DataFrame When this is not the case, there will be pd.NaN values resulting from pd.merge and you'll have to employ a few more tricks to get the nice Boolean (integer) column to show what matched and what didn't.如果不是这种情况,会有pd.NaN从产生的值pd.merge ,你将不得不使用一些更多的技巧来获得不错的布尔(整数)列显示什么匹配,什么也没有。 HTH.哈。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM