简体   繁体   English

如何重构数据框以基于Column [se]值创建新的列标签,然后使用Column [value]值填充这些新列

[英]How can I restructure a dataframe to create new column labels based on Column[se] values and then populate those new columns with Column[value] Values

Original Dataframe 原始数据帧

index                Date  Device  Element  Sub_Element    Value
179593 2017-11-28 16:39:00       x        y   eth_txload        9
179594 2017-11-28 16:39:00       x        y   eth_rxload       30
179595 2017-11-28 16:39:00       x        y  eth_ip_addr  x.x.x.x
179596 2017-11-28 16:39:00       x        y  description   string

Desired Dataframe 期望的数据帧

Date  Device  Element  description eth_txload eth_rxload eth_ip_addr
2017-11-28 16:39:00       x        y       string          9         30     x.x.x.x

What would be the best way to go about this? 最好的方法是什么?

Create Dataframes for each Sub_Element and merge on=['Date', 'Device', 'Element']? 为每个Sub_Element创建数据帧并合并= ['Date','Device','Element']?

Or use some df.iloc magic to create a boolean mask and apply the value to a new column? 或者使用一些df.iloc魔法来创建一个布尔掩码并将值应用于新列?

Or maybe there is a better/more efficient way I'm missing? 或者也许有一种更好/更有效的方式我会失踪?

IIUC, given: IIUC,给出:

print(df)

    index                 Date Device Element  Sub_Element    Value
0  179593  2017-11-28 16:39:00      x       y   eth_txload        9
1  179594  2017-11-28 16:39:00      x       y   eth_rxload       30
2  179595  2017-11-28 16:39:00      x       y  eth_ip_addr  x.x.x.x
3  179596  2017-11-28 16:39:00      x       y  description   string

Then: 然后:

df_out = df.set_index(['Date','Device','Element','Sub_Element'])\
           .drop('index',1).unstack()['Value'].reset_index()

print(df_out)

Output: 输出:

Sub_Element                 Date Device Element description eth_ip_addr eth_rxload eth_txload
0            2017-11-28 16:39:00      x       y      string     x.x.x.x         30          9

Here is how I did it. 我就是这样做的。 My solution is not as "fancy" as Scott's but I broke down the steps in my logic. 我的解决方案并不像斯科特那样“花哨”,但我打破了逻辑中的步骤。 His solution is probably better for a plug-and-play scenario: 对于即插即用场景,他的解决方案可能更好:

#reading in dataframe from your text
df1 = pd.read_clipboard()

# creating an untouched copy of df1 for minpulation
df2 = df1.copy()    

# dropping the duplicates of index and Date to get one row
df1 = df1.drop_duplicates(subset=['index', 'Date'])

# creating a dictionary of key, value pairs for each column and value
kv = dict(zip(df2.Sub_Element, df2.Value))

# creating a datframe out of the above dictionary
new_df = pd.DataFrame(kv, index=[0])

# creating temp values to merge on
df1['tmp'] = 1
new_df['tmp'] = 1

# merging on the tmp values
output_df = df1.merge(new_df, on='tmp')

# cleaning up for the output
del output_df['Sub_Element']
del output_df['Value']
del output_df['tmp]

#output
        index      Date Device Element description eth_ip_addr eth_rxload  eth_txload 
0  2017-11-28  16:39:00      x       y      string     x.x.x.x         30   9

An admittedly more SQL-like solution but avoids dealing with indexes: 一个公认的更像SQL的解决方案,但避免处理索引:

# read in the dataframe
df = pd.read_clipboard()

# set up what we will be joining to
anchor = df[['Date','Device','Element']].drop_duplicates()

# loop through the values we want to pivot out
for element in df['Sub_Element'].unique():

    # filter the original dataframe for the value for Sub_Element
    # using the copy method avoids SettingWithCopyWarning
    temp = df[df['Sub_Element']==element].copy() 

    temp.rename(columns={'Value':element},inplace=True) #rename the header

    # left join the new dataframe to the anchor in case of NaNs
    anchor = anchor.merge(temp[['Date','Device','Element',element]],
                          on=['Date','Device','Element'],how='left')
print(anchor)

Output: 输出:

                  Date Device Element eth_txload eth_rxload eth_ip_addr description
0  2017-11-28 16:39:00      x       y          9         30     x.x.x.x string

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建一个 function 扫描多个 dataframe 列以获取值。 如果找到这些值中的任何一个,则新列将返回给定的数字 - How do i create a function that scans multiple dataframe columns for a value. if any of those values are found the new column returns a given figure 如何根据 python 中 2 列的条件创建新的 dataframe 列? - How can I create new dataframe column with values based on condition of 2 columns in python? 如何根据其他列的值在数据框中创建新列? - How to create a new column in a dataframe based off values of other columns? 通过解析列值为数据框创建新列,并使用来自另一列python的值填充新列 - Create new columns for a dataframe by parsing column values and populate new columns with values from another column python 如何使用 Python 查看数据框中一列的值并根据这些结果在另一列中附加一个新值? - How do I use Python to look at the values of one column in a dataframe and append a new value in another column based on those results? 如何在新列中填充值 - How to populate values inside a new column based values from other columns in a dataframe in Pandas 有没有办法比较包含浮点值的数据帧的两列并创建一个新列以基于它添加标签? - Is there a way to compare two columns of a dataframe containing float values and create a new column to add labels based on it? MultiIndex DataFrame:如何基于其他列中的值创建新列? - MultiIndex DataFrame: How to create a new column based on values in other column? 如何根据其他列中的值填充新列? - How to populate new column based on values in other columns? 根据两个不同列中的各自值在 DataFrame 中创建新列 - Create new column in DataFrame based on respective values in two different columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM