简体   繁体   English

使用列标题进行熊猫查找/数据透视

[英]Pandas lookup/pivot using column headings

I have a table containing watershed IDs and land cover classes: 我有一张包含分水岭ID和土地覆被类别的表:

WatershedID LandCover
          2      Corn
          8      Corn
          2       Soy
          8       Soy

and a separate lookup table which contains the area for each watershed/land cover combination: 还有一个单独的查找表,其中包含每个分水岭/土地覆盖组合的面积:

WatershedID  Corn  Soy
          2    14    1
          3     2   14
          5    18    8
          7    21    2
          8     6   31

What I would like to do is to append a column to the first table which contains the corresponding row/column value in the lookup table, like so: 我想做的是向第一张表追加一个列,该列包含查找表中相应的行/列值,如下所示:

WatershedID LandCover   Area
          2      Corn     14
          8      Corn      6
          2       Soy      1
          8       Soy     31

I've managed to do this by iterating with a for loop: 我设法通过for循环进行迭代:

areas = []
for watershed_id, land_cover in tableA.iterrows():
    areas.append(tableB.loc[watershed_id][land_cover]

but given the size of my tables, this is slow. 但是鉴于我的桌子的大小,这很慢。 Is there a faster way to do this that doesn't involve iteration? 有没有一种不涉及迭代的更快方法? I've been experimenting with MultiIndexing and pivot tables, but nothing has worked so far. 我一直在尝试使用MultiIndexing和数据透视表,但到目前为止没有任何效果。

You can use unstack with merge : 您可以将unstackmerge一起使用:

df3 = df2.set_index('WatershedID').unstack().reset_index()
df3.columns = ['LandCover','WatershedID','Area']
print (df3)
  LandCover  WatershedID  Area
0      Corn            2    14
1      Corn            3     2
2      Corn            5    18
3      Corn            7    21
4      Corn            8     6
5       Soy            2     1
6       Soy            3    14
7       Soy            5     8
8       Soy            7     2
9       Soy            8    31

print (pd.merge(df1,df3))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31

If there are more same columns you need specify columns for join: 如果有更多相同的列,则需要指定用于连接的列:

print (pd.merge(df1,df3, on=['WatershedID','LandCover']))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM