[英]Python: Merging/joining two dataframes
我正在嘗試合並/聯接兩個數據幀,每個數據幀具有三個鍵(年齡,性別和Signed_In)。 這兩個數據框具有相同的父級,並由groupby創建,但是具有唯一的值列。
鑒於唯一的組合鍵在兩個數據幀之間共享,因此合並/連接似乎應該很輕松。 認為我嘗試“合並”和“加入”時肯定存在一些簡單的錯誤,但我一生無法解決。
times = pd.read_csv('nytimes.csv')
# Produces times_mean table consisting of two value columns, avg_impressions and avg_clicks
times_mean = times.groupby(['Age','Gender','Signed_In']).mean()
times_mean.columns = ['avg_impressions', 'avg_clicks']
# Produces times_max table consisting of two value columns, max_impressions and max_clicks
times_max = times.groupby(['Age','Gender','Signed_In']).max()
times_max.columns = ['max_impressions', 'max_clicks']
# Following intended to produce combined table with four value columns
times_join = times_mean.join(times_max, on = ['Age', 'Gender', 'Signed_In'])
times_join2 = pd.merge(times_mean, times_max, on=['Age', 'Gender', 'Signed_In'])
你並不需要在on
加入上等價的結構,當kwarg MultiIndex
這是一個演示此示例:
import numpy as np
import pandas
a = np.random.normal(size=10)
b = a + 10
index = pandas.MultiIndex.from_product([['A', 'B'], list('abcde')])
df_a = pandas.DataFrame(a, index=index, columns=['colA'])
df_b = pandas.DataFrame(b, index=index, columns=['colB'])
df_a.join(df_b)
這給了我:
colA colB
A a -1.525376 8.474624
b 0.778333 10.778333
c 1.153172 11.153172
d 0.966560 10.966560
e 0.089765 10.089765
B a 0.717717 10.717717
b 0.305545 10.305545
c 0.123548 10.123548
d -1.018660 8.981340
e -0.635103 9.364897
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.