简体   繁体   English

Pandas-创建一个基于列值插入新行的表?

[英]Pandas- creating a table that inserts new rows based on column values?

I have a dataframe that has names of an item, data on it, and then competitor data all in one row: 我有一个数据框,其中包含项目的名称,数据,然后是竞争对手的数据:

 name   value1   value2    ex_value1     ex_value2   
 jim       0.4      0.6           0.7           0.3  
 tim       0.2      0.8   0.766666667   0.233333333  
 john        1        0           0.5           0.5  
 paul      0.9      0.1   0.533333333   0.466666667  

What I want to do is create a new table that has indexes by name, but inserts new rows based on the competitor data, so that it shows jim, ex-jim, tim,ex-tim, etc: 我想要做的是创建一个按名称索引的新表,但根据竞争对手的数据插入新行,以便显示jim,ex-jim,tim,ex-tim等:

   name       value1        value2     
  jim               0.4           0.6  
  tim               0.2           0.8  
  john                1             0  
  paul              0.9           0.1  
  ex_jim            0.7           0.3  
  ex_tim    0.766666667   0.233333333  
  ex_john           0.5           0.5  
  ex_paul   0.533333333   0.466666667  

How would I go about doing this? 我该怎么做呢? Would I have to set index on name, then insert new that way? 我是否必须在名称上设置索引,然后以这种方式插入新的? Would I got about this through a loop? 我会通过一个循环得到这个吗? Appreciate guidance on this 对此表示赞赏

You can do this using concat 你可以使用concat来做到这一点

df_ex = df[['name','ex_value1', 'ex_value2']].rename(columns = {'ex_value1': 'value1', 'ex_value2': 'value2'})

df_ex['name'] = 'ex_' + df_ex['name']

pd.concat([df[['name','value1', 'value2']], df_ex ]).round(2)

    name    value1  value2
0   jim     0.40    0.60
1   tim     0.20    0.80
2   john    1.00    0.00
3   paul    0.90    0.10
0   ex_jim  0.70    0.30
1   ex_tim  0.77    0.23
2   ex_john 0.50    0.50
3   ex_paul 0.53    0.47

I would like recreate the df, you can add the reset_index() at the end 我想重新创建df,你可以在最后添加reset_index()

pd.DataFrame(df.iloc[:,1:].values.reshape(8,2),index=['','ex_']*4+df.name.repeat(2),columns=['value1','value2'])
Out[986]: 
           value1    value2
name                       
jim      0.400000  0.600000
ex_jim   0.700000  0.300000
tim      0.200000  0.800000
ex_tim   0.766667  0.233333
john     1.000000  0.000000
ex_john  0.500000  0.500000
paul     0.900000  0.100000
ex_paul  0.533333  0.466667

I would recommend splitting your dataframe into two and then concatting it back together. 我建议将您的数据帧拆分为两个,然后再将它们重新连接起来。 Something like: 就像是:

import pandas as pd

df = pd.DataFrame([['jim', .4, .6, .7, .3], ['john', 1, 0, .5, .5]], columns=['name', 'value1', 'value2', 'ex_value1', 'ex_value2'])

ex_df = df.copy()

ex_df['name'] = 'ex_'+ex_df['name'].astype(str)

ex_df = ex_df[['name', 'ex_value1', 'ex_value2']]
ex_df.columns = ['name', 'value1', 'value2']

df = df[['name', 'value1', 'value2']]

frames = (df, ex_df)

new = pd.concat(frames).reset_index()
new = new[['name', 'value1', 'value2']]

print(new)

#output
         name  value1  value2
0      jim     0.4     0.6
1     john     1.0     0.0
2   ex_jim     0.7     0.3
3  ex_john     0.5     0.5

You could go for 你可以去

def myfunc(row):
    return pd.Series({'name': 'ex_{}'.format(row['name']), 
                      'value1': row['ex_value1'], 
                      'value2': row['ex_value2']})

df2 = df[~df['name'].astype(str).str.startswith('ex_')].apply(myfunc,axis =1)
df = pd.concat([df[['name', 'value1', 'value2']], df2])

This applies the function myfunc only to those rows where name does not start with ex_ . 这仅将函数myfunc应用于name不以ex_开头的行。 myfunc() returns a new dataframe which is then concatenated to df . myfunc()返回一个新的数据帧,然后连接到df


For one-liner-lovers (though not advisable, really): 对于单线爱好者(虽然不可取,但确实如此):

 df = pd.concat([df[['name', 'value1', 'value2']], df[~df['name'].astype(str).str.startswith('ex_')].apply(myfunc,axis = 1)]) 

You could use a combination of melt and pivot 你可以使用meltpivot的组合

df2 = df.melt('name')
df2.loc[df2.variable.str.contains('ex'),'name'] = 'ex_' +df2.name
df2.variable = df2.variable.str.strip('ex_')
df2 = df2.pivot(index='name',columns='variable').reset_index()
df2.columns = df2.columns.droplevel(0)

which gives you 给你的

variable             value1    value2
0          ex_jim  0.700000  0.300000
1         ex_john  0.500000  0.500000
2         ex_paul  0.533333  0.466667
3          ex_tim  0.766667  0.233333
4             jim  0.400000  0.600000
5            john  1.000000  0.000000
6            paul  0.900000  0.100000
7             tim  0.200000  0.800000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫-根据另一列的行总数创建新列的正确方法(试图在副本上设置的值)? - Pandas- correct way to create a new column based on the sum of rows of another column (value trying to be set on a copy)? python (pandas) 根据不同行的值创建一个新列 - python (pandas) creating a new column based on values from different rows Python Pandas-如何解开具有两个值的数据透视表,每个值变成一个新列? - Python Pandas- how to unstack a pivot table with two values with each value becoming a new column? Pandas-根据列值在一行中查找第一次出现 - Pandas- Finding first occurence in a row based on column values Pandas 根据列值创建新行 - Pandas create new rows based on column values python&pandas-根据DataFrame列中的某些值计算出十二行 - python & pandas- Calculation bewteen rows based on certain values in columns from DataFrame Pandas:根据现有列的值创建新列 - Pandas: Creating new column based on values from existing column 根据另一列的值在 Pandas 中创建新列 - Creating new column in Pandas based on values from another column 在 pandas 中相对于其他行的值创建一个新列 - Creating a new column in pandas with respect to the values of other rows 通过查找其他行中的值来创建新的 Pandas 数据框列 - Creating a new pandas dataframe column by looking up values in other rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM