簡體   English   中英

Python:合並/合並兩個數據框

[英]Python: Merging/joining two dataframes

我正在嘗試合並/聯接兩個數據幀,每個數據幀具有三個鍵(年齡,性別和Signed_In)。 這兩個數據框具有相同的父級,並由groupby創建,但是具有唯一的值列。

鑒於唯一的組合鍵在兩個數據幀之間共享,因此合並/連接似乎應該很輕松。 認為我嘗試“合並”和“加入”時肯定存在一些簡單的錯誤,但我一生無法解決。

times = pd.read_csv('nytimes.csv')

# Produces times_mean table consisting of two value columns, avg_impressions and avg_clicks
times_mean = times.groupby(['Age','Gender','Signed_In']).mean()
times_mean.columns = ['avg_impressions', 'avg_clicks']

# Produces times_max table consisting of two value columns, max_impressions and max_clicks
times_max = times.groupby(['Age','Gender','Signed_In']).max()
times_max.columns = ['max_impressions', 'max_clicks']

# Following intended to produce combined table with four value columns
times_join = times_mean.join(times_max, on = ['Age', 'Gender', 'Signed_In'])
times_join2 = pd.merge(times_mean, times_max, on=['Age', 'Gender', 'Signed_In'])

你並不需要在on加入上等價的結構,當kwarg MultiIndex

這是一個演示此示例:

import numpy as np
import pandas

a = np.random.normal(size=10)
b = a + 10
index = pandas.MultiIndex.from_product([['A', 'B'], list('abcde')])

df_a = pandas.DataFrame(a, index=index, columns=['colA'])
df_b = pandas.DataFrame(b, index=index, columns=['colB'])

df_a.join(df_b)

這給了我:

    colA       colB
A a -1.525376   8.474624
  b  0.778333  10.778333
  c  1.153172  11.153172
  d  0.966560  10.966560
  e  0.089765  10.089765
B a  0.717717  10.717717
  b  0.305545  10.305545
  c  0.123548  10.123548
  d -1.018660   8.981340
  e -0.635103   9.364897

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM