[英]How can I restructure a dataframe to create new column labels based on Column[se] values and then populate those new columns with Column[value] Values
Original Dataframe 原始数据帧
index Date Device Element Sub_Element Value 179593 2017-11-28 16:39:00 x y eth_txload 9 179594 2017-11-28 16:39:00 x y eth_rxload 30 179595 2017-11-28 16:39:00 x y eth_ip_addr x.x.x.x 179596 2017-11-28 16:39:00 x y description string
Desired Dataframe 期望的数据帧
Date Device Element description eth_txload eth_rxload eth_ip_addr 2017-11-28 16:39:00 x y string 9 30 x.x.x.x
What would be the best way to go about this? 最好的方法是什么?
Create Dataframes for each Sub_Element and merge on=['Date', 'Device', 'Element']? 为每个Sub_Element创建数据帧并合并= ['Date','Device','Element']?
Or use some df.iloc magic to create a boolean mask and apply the value to a new column? 或者使用一些df.iloc魔法来创建一个布尔掩码并将值应用于新列?
Or maybe there is a better/more efficient way I'm missing? 或者也许有一种更好/更有效的方式我会失踪?
IIUC, given: IIUC,给出:
print(df)
index Date Device Element Sub_Element Value
0 179593 2017-11-28 16:39:00 x y eth_txload 9
1 179594 2017-11-28 16:39:00 x y eth_rxload 30
2 179595 2017-11-28 16:39:00 x y eth_ip_addr x.x.x.x
3 179596 2017-11-28 16:39:00 x y description string
Then: 然后:
df_out = df.set_index(['Date','Device','Element','Sub_Element'])\
.drop('index',1).unstack()['Value'].reset_index()
print(df_out)
Output: 输出:
Sub_Element Date Device Element description eth_ip_addr eth_rxload eth_txload
0 2017-11-28 16:39:00 x y string x.x.x.x 30 9
Here is how I did it. 我就是这样做的。 My solution is not as "fancy" as Scott's but I broke down the steps in my logic.
我的解决方案并不像斯科特那样“花哨”,但我打破了逻辑中的步骤。 His solution is probably better for a plug-and-play scenario:
对于即插即用场景,他的解决方案可能更好:
#reading in dataframe from your text
df1 = pd.read_clipboard()
# creating an untouched copy of df1 for minpulation
df2 = df1.copy()
# dropping the duplicates of index and Date to get one row
df1 = df1.drop_duplicates(subset=['index', 'Date'])
# creating a dictionary of key, value pairs for each column and value
kv = dict(zip(df2.Sub_Element, df2.Value))
# creating a datframe out of the above dictionary
new_df = pd.DataFrame(kv, index=[0])
# creating temp values to merge on
df1['tmp'] = 1
new_df['tmp'] = 1
# merging on the tmp values
output_df = df1.merge(new_df, on='tmp')
# cleaning up for the output
del output_df['Sub_Element']
del output_df['Value']
del output_df['tmp]
#output
index Date Device Element description eth_ip_addr eth_rxload eth_txload
0 2017-11-28 16:39:00 x y string x.x.x.x 30 9
An admittedly more SQL-like solution but avoids dealing with indexes: 一个公认的更像SQL的解决方案,但避免处理索引:
# read in the dataframe
df = pd.read_clipboard()
# set up what we will be joining to
anchor = df[['Date','Device','Element']].drop_duplicates()
# loop through the values we want to pivot out
for element in df['Sub_Element'].unique():
# filter the original dataframe for the value for Sub_Element
# using the copy method avoids SettingWithCopyWarning
temp = df[df['Sub_Element']==element].copy()
temp.rename(columns={'Value':element},inplace=True) #rename the header
# left join the new dataframe to the anchor in case of NaNs
anchor = anchor.merge(temp[['Date','Device','Element',element]],
on=['Date','Device','Element'],how='left')
print(anchor)
Output: 输出:
Date Device Element eth_txload eth_rxload eth_ip_addr description
0 2017-11-28 16:39:00 x y 9 30 x.x.x.x string
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.