简体   繁体   English

插入一行到 pandas dataframe

[英]Insert a row to pandas dataframe

I have a dataframe:我有一个 dataframe:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

   A  B  C
0  5  6  7
1  7  8  9

[2 rows x 3 columns]

and I need to add a first row [2, 3, 4] to get:我需要添加第一行 [2, 3, 4] 来获得:

   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

I've tried append() and concat() functions but can't find the right way how to do that.我尝试了append()concat()函数,但找不到正确的方法。

How to add/insert series to dataframe?如何在 dataframe 中添加/插入系列?

Just assign row to a particular index, using loc :只需使用loc将行分配给特定索引:

 df.loc[-1] = [2, 3, 4]  # adding a row
 df.index = df.index + 1  # shifting index
 df = df.sort_index()  # sorting by index

And you get, as desired:你会得到,如你所愿:

    A  B  C
 0  2  3  4
 1  5  6  7
 2  7  8  9

See in Pandas documentation Indexing: Setting with enlargement .请参阅 Pandas 文档索引:设置放大

Not sure how you were calling concat() but it should work as long as both objects are of the same type.不确定您是如何调用concat()的,但只要两个对象属于同一类型,它就应该可以工作。 Maybe the issue is that you need to cast your second vector to a dataframe?也许问题是您需要将第二个向量转换为数据框? Using the df that you defined the following works for me:使用您定义的 df 对我有用:

df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])

One way to achieve this is实现这一目标的一种方法是

>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Generally, it's easiest to append dataframes, not series.通常,最容易附加数据帧,而不是系列。 In your case, since you want the new row to be "on top" (with starting id), and there is no function pd.prepend() , I first create the new dataframe and then append your old one.在您的情况下,由于您希望新行位于“顶部”(带有起始 ID),并且没有函数pd.prepend() ,因此我首先创建新数据框,然后附加您的旧数据框。

ignore_index will ignore the old ongoing index in your dataframe and ensure that the first row actually starts with index 1 instead of restarting with index 0 . ignore_index将忽略数据框中旧的正在进行的索引,并确保第一行实际上从索引1开始,而不是从索引0重新开始。

Typical Disclaimer: Cetero censeo ... appending rows is a quite inefficient operation.典型的免责声明:Cetero censeo ...追加行是一种非常低效的操作。 If you care about performance and can somehow ensure to first create a dataframe with the correct (longer) index and then just inserting the additional row into the dataframe, you should definitely do that.如果您关心性能并且可以以某种方式确保首先创建一个具有正确(更长)索引的数据帧,然后附加行插入数据帧,那么您绝对应该这样做。 See:看:

>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]: 
     A    B    C
0    5    6    7
1    7    8    9
2  NaN  NaN  NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]

So far, we have what you had as df :到目前为止,我们拥有您所拥有的df

>>> df2
Out[339]: 
     A    B    C
0  NaN  NaN  NaN
1    5    6    7
2    7    8    9

But now you can easily insert the row as follows.但是现在您可以轻松地插入该行,如下所示。 Since the space was preallocated, this is more efficient.由于空间是预先分配的,因此效率更高。

>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Testing a few answers it is clear that using pd.concat() is more efficient for large dataframes.测试几个答案很明显,使用pd.concat()对于大型数据帧更有效。

Comparing the performance using dict and list , the list is more efficient, but for small dataframes, using a dict should be no problem and somewhat more readable.比较使用dictlist的性能, list效率更高,但对于小型数据帧,使用dict应该没有问题并且更具可读性。


1st - pd.concat() + list第一个 - pd.concat() + list

%%timeit
df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df], ignore_index=True)

4.88 s ± 47.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)每个循环4.88 秒± 47.1 毫秒(平均值 ± 标准偏差。7 次运行,每个循环 1 个)

2nd - pd.append() + dict第二 - pd.append() + dict

%%timeit

df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = df.append({'a': 1, 'b': 2}, ignore_index=True)

10.2 s ± 41.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)每个循环10.2 秒± 41.4 毫秒(平均值 ± 标准偏差。7 次运行,每个循环 1 个)

3rd - pd.DataFrame().loc + index operations第三 - pd.DataFrame().loc + index operations

%%timeit
df = pd.DataFrame(columns=['a','b'])
for i in range(10000):
    df.loc[-1] = [1,2]
    df.index = df.index + 1
    df = df.sort_index()

17.5 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)每个循环17.5 秒± 37.3 毫秒(平均值 ± 标准偏差。7 次运行,每个循环 1 个)

I put together a short function that allows for a little more flexibility when inserting a row:我整理了一个简短的函数,可以在插入行时提供更多的灵活性:

def insert_row(idx, df, df_insert):
    dfA = df.iloc[:idx, ]
    dfB = df.iloc[idx:, ]

    df = dfA.append(df_insert).append(dfB).reset_index(drop = True)

    return df

which could be further shortened to:可以进一步缩短为:

def insert_row(idx, df, df_insert):
    return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)

Then you could use something like:然后你可以使用类似的东西:

df = insert_row(2, df, df_new)

where 2 is the index position in df where you want to insert df_new .其中2是要插入df_newdf中的索引位置。

We can use numpy.insert .我们可以使用numpy.insert This has the advantage of flexibility.这具有灵活性的优点。 You only need to specify the index you want to insert to.您只需要指定要插入的索引。

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0))

    0   1   2
0   2   3   4
1   5   6   7
2   7   8   9

For np.insert(df.values, 0, values=[2, 3, 4], axis=0) , 0 tells the function the place/index you want to place the new values.对于np.insert(df.values, 0, values=[2, 3, 4], axis=0) , 0 告诉函数您要放置新值的位置/索引。

It is pretty simple to add a row into a pandas DataFrame :在 pandas DataFrame中添加一行非常简单:

  1. Create a regular Python dictionary with the same columns names as your Dataframe ;创建一个与Dataframe具有相同列名的常规 Python 字典;

  2. Use pandas.append() method and pass in the name of your dictionary, where .append() is a method on DataFrame instances;使用pandas.append()方法并传入字典的名称,其中.append()是 DataFrame 实例上的方法;

  3. Add ignore_index=True right after your dictionary name.在您的字典名称后添加ignore_index=True

this might seem overly simple but its incredible that a simple insert new row function isn't built in. i've read a lot about appending a new df to the original, but i'm wondering if this would be faster.这可能看起来过于简单,但令人难以置信的是,没有内置一个简单的插入新行函数。我已经阅读了很多关于将新 df 附加到原始文件的内容,但我想知道这是否会更快。

df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]

Below would be the best way to insert a row into pandas dataframe without sorting and reseting an index:下面是在不排序和重置索引的情况下将行插入 pandas 数据帧的最佳方法:

import pandas as pd

df = pd.DataFrame(columns=['a','b','c'])

def insert(df, row):
    insert_loc = df.index.max()

    if pd.isna(insert_loc):
        df.loc[0] = row
    else:
        df.loc[insert_loc + 1] = row

insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)

concat() seems to be a bit faster than last row insertion and reindexing. concat()似乎比最后一行插入和重新索引快一点。 In case someone would wonder about the speed of two top approaches:如果有人想知道两种顶级方法的速度:

In [x]: %%timeit
     ...: df = pd.DataFrame(columns=['a','b'])
     ...: for i in range(10000):
     ...:     df.loc[-1] = [1,2]
     ...:     df.index = df.index + 1
     ...:     df = df.sort_index()

17.1 s ± 705 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)每个循环 17.1 秒 ± 705 毫秒(平均值 ± 标准偏差。7 次运行,每个循环 1 个)

In [y]: %%timeit
     ...: df = pd.DataFrame(columns=['a', 'b'])
     ...: for i in range(10000):
     ...:     df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df])

6.53 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)每个循环6.53秒 ± 127 毫秒(平均值 ± 标准偏差。7 次运行,每个循环 1 个)

It just came up to me that maybe T attribute is a valid choice.我突然想到,也许T 属性是一个有效的选择。 Transpose , can get away from the somewhat misleading df.loc[-1] = [2, 3, 4] as @flow2k mentioned, and it is suitable for more universal situation such as you want to insert [2, 3, 4] before arbitrary row, which is hard for concat() , append() to achieve. Transpose ,可以摆脱@flow2k 提到的有点误导的df.loc[-1] = [2, 3, 4] ,它适用于更普遍的情况,例如您要插入[2, 3, 4]在任意行之前, concat()append()很难实现。 And there's no need to bare the trouble defining and debugging a function.并且无需承担定义和调试函数的麻烦。

a = df.T
a.insert(0,'anyName',value=[2,3,4])
# just give insert() any column name you want, we'll rename it.
a.rename(columns=dict(zip(a.columns,[i for i in range(a.shape[1])])),inplace=True)
# set inplace to a Boolean as you need.
df=a.T
df

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9

I guess this can partly explain @MattCochrane 's complaint about why pandas doesn't have a method to insert a row like insert() does.我想这可以部分解释 @MattCochrane 关于为什么 pandas 没有像 insert() 那样插入行的方法的抱怨。

You can simply append the row to the end of the DataFrame, and then adjust the index.您可以简单地将行附加到 DataFrame 的末尾,然后调整索引。

For instance:例如:

df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()

Or use concat as:或使用concat作为:

df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)

Do as following example:执行以下示例:

a_row = pd.Series([1, 2])

df = pd.DataFrame([[3, 4], [5, 6]])

row_df = pd.DataFrame([a_row])

df = pd.concat([row_df, df], ignore_index=True)

and the result is:结果是:

   0  1
0  1  2
1  3  4
2  5  6

Create empty df with columns name:使用列名创建空 df:

df = pd.DataFrame(columns = ["A", "B", "C"])

Insert new row:插入新行:

df.loc[len(df.index)] = [2, 3, 4]
df.loc[len(df.index)] = [5, 6, 7]
df.loc[len(df.index)] = [7, 8, 9]

Give the data structure of dataframe of pandas is a list of series (each series is a column), it is convenient to insert a column at any position.给定pandas的dataframe的数据结构是一个series列表(每个series是一个column),方便在任意位置插入column。 So one idea I came up with is to first transpose your data frame, insert a column, and transpose it back.所以我想出的一个想法是首先转置您的数据框,插入一列,然后将其转回。 You may also need to rename the index (row names), like this:您可能还需要重命名索引(行名),如下所示:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])
df = df.transpose()
df.insert(0, 2, [2,3,4])
df = df.transpose()
df.index = [i for i in range(3)]
df

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9
s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

To insert a new row anywhere, you can specify the row position: row_pos = -1 for inserting at the top or row_pos = 0.5 for inserting between row 0 and row 1.要在任意位置插入新行,您可以指定行位置:row_pos = -1 表示在顶部插入,row_pos = 0.5 表示在第 0 行和第 1 行之间插入。

row_pos = -1
insert_row = [2,3,4]

df.loc[row_pos] = insert_row
df = df.sort_index()
df = df.reset_index(drop = True)

row_pos = -1

The outcome is:

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9

row_pos = 0.5

The outcome is:

    A   B   C
0   5   6   7
1   2   3   4
2   7   8   9

For those that want to concat a row from the previous data frame, use double bracket for iloc.对于那些想要从前一个数据帧连接一行的人,请使用双括号来表示 iloc。

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

#   A   B   C
# 0 5   6   7
# 1 7   8   9

pd.concat((df.iloc[[0]],
           df,ignore_index=True)

#   A   B   C
# 0 5   6   7
# 1 5   6   7
# 2 7   8   9

For duplicating or replicating arbitrary times, combine with star.对于复制或复制任意时间,与星组合。

pd.concat((pd.concat((df.iloc[[0]],
                      df), ignore_index=True),
           df.iloc[[0]],
           *[df.iloc[[1]]] * 4),ignore_index=True)
#   A   B   C
# 0 5   6   7
# 1 5   6   7
# 2 7   8   9
# 3 5   6   7
# 4 7   8   9
# 5 7   8   9
# 6 7   8   9
# 7 7   8   9

The simplest way add a row in a pandas data frame is:在 pandas 数据框中添加一行的最简单方法是:

DataFrame.loc[ location of insertion ]= list( )

Example :例子 :

DF.loc[ 9 ] = [ ´Pepe’ , 33, ´Japan’ ]

NB: the length of your list should match that of the data frame.注意:列表的长度应与数据框的长度相匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM