简体   繁体   English

通过一次添加一行来创建 Pandas Dataframe

[英]Create a Pandas Dataframe by appending one row at a time

I understand that Pandas is designed to load a fully populated DataFrame , but I need to create an empty DataFrame then add rows, one by one .我知道 Pandas 旨在加载一个完全填充的DataFrame ,但我需要创建一个空的 DataFrame 然后逐行添加行 What is the best way to do this?做这个的最好方式是什么?

I successfully created an empty DataFrame with:我成功地创建了一个空的 DataFrame:

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with:然后我可以添加一个新行并填充一个字段:

res = res.set_value(len(res), 'qty1', 10.0)

It works, but it seems very odd :-/ (It fails for adding a string value.)它有效,但似乎很奇怪:-/(添加字符串值失败。)

How can I add a new row to my DataFrame (with a different columns type)?如何向我的 DataFrame 添加新行(具有不同的列类型)?

You can use df.loc[i] , where the row with index i will be what you specify it to be in the dataframe.您可以使用df.loc[i] ,其中具有索引i行将是您在数据df.loc[i]指定的行。

>>> import pandas as pd
>>> from numpy.random import randint

>>> df = pd.DataFrame(columns=['lib', 'qty1', 'qty2'])
>>> for i in range(5):
>>>     df.loc[i] = ['name' + str(i)] + list(randint(10, size=2))

>>> df
     lib qty1 qty2
0  name0    3    3
1  name1    2    4
2  name2    2    8
3  name3    2    1
4  name4    9    6

In case you can get all data for the data frame upfront, there is a much faster approach than appending to a data frame:如果您可以预先获取数据框的所有数据,有一种比附加到数据框更快的方法:

  1. Create a list of dictionaries in which each dictionary corresponds to an input data row.创建一个字典列表,其中每个字典对应一个输入数据行。
  2. Create a data frame from this list.从此列表创建一个数据框。

I had a similar task for which appending to a data frame row by row took 30 min, and creating a data frame from a list of dictionaries completed within seconds.我有一个类似的任务,将一行一行地附加到数据框需要 30 分钟,并从几秒钟内完成的字典列表中创建一个数据框。

rows_list = []
for row in input_rows:

        dict1 = {}
        # get input row in dictionary format
        # key = col_name
        dict1.update(blah..) 

        rows_list.append(dict1)

df = pd.DataFrame(rows_list)               

You could use pandas.concat() or DataFrame.append() .您可以使用pandas.concat()DataFrame.append() For details and examples, see Merge, join, and concatenate .有关详细信息和示例,请参阅合并、连接和连接

In the case of adding a lot of rows to dataframe, I am interested in performance .在向数据帧添加大量行的情况下,我对性能感兴趣。 So I tried the four most popular methods and checked their speed.所以我尝试了四种最流行的方法并检查了它们的速度。

Performance表现

  1. Using .append ( NPE's answer )使用 .append( NPE 的回答
  2. Using .loc ( fred's answer )使用 .loc( fred 的回答
  3. Using .loc with preallocating ( FooBar's answer )使用 .loc 进行预分配( FooBar 的回答
  4. Using dict and create DataFrame in the end ( ShikharDua's answer )最后使用 dict 并创建 DataFrame( ShikharDua 的回答

Runtime results (in seconds):运行时结果(以秒为单位):

Approach方法 1000 rows 1000 行 5000 rows 5000 行 10 000 rows 10 000 行
.append 。附加 0.69 0.69 3.39 3.39 6.78 6.78
.loc without prealloc .loc 没有预分配 0.74 0.74 3.90 3.90 8.35 8.35
.loc with prealloc .loc 与预分配 0.24 0.24 2.58 2.58 8.70 8.70
dict字典 0.012 0.012 0.046 0.046 0.084 0.084

So I use addition through the dictionary for myself.所以我自己通过字典使用加法。


Code:代码:

import pandas as pd
import numpy as np
import time

del df1, df2, df3, df4
numOfRows = 1000
# append
startTime = time.perf_counter()
df1 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows-4):
    df1 = df1.append( dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']), ignore_index=True)
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df1.shape)

# .loc w/o prealloc
startTime = time.perf_counter()
df2 = pd.DataFrame(np.random.randint(100, size=(5,5)), columns=['A', 'B', 'C', 'D', 'E'])
for i in range( 1,numOfRows):
    df2.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df2.shape)

# .loc with prealloc
df3 = pd.DataFrame(index=np.arange(0, numOfRows), columns=['A', 'B', 'C', 'D', 'E'] )
startTime = time.perf_counter()
for i in range( 1,numOfRows):
    df3.loc[i]  = np.random.randint(100, size=(1,5))[0]
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df3.shape)

# dict
startTime = time.perf_counter()
row_list = []
for i in range (0,5):
    row_list.append(dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E']))
for i in range( 1,numOfRows-4):
    dict1 = dict( (a,np.random.randint(100)) for a in ['A','B','C','D','E'])
    row_list.append(dict1)

df4 = pd.DataFrame(row_list, columns=['A','B','C','D','E'])
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df4.shape)

PS: I believe my realization isn't perfect, and maybe there is some optimization that could be done. PS:我相信我的实现并不完美,也许可以做一些优化。

NEVER grow a DataFrame!永远不要增长数据帧!

Yes, people have already explained that you should NEVER grow a DataFrame, and that you should append your data to a list and convert it to a DataFrame once at the end.是的,人们已经解释过你永远不应该增长一个 DataFrame,你应该将你的数据附加到一个列表中,并在最后一次将它转换为一个 DataFrame。 But do you understand why?但是你明白为什么吗?

Here are the most important reasons, taken from my post here .这里是最重要的原因,从我的岗位采取这里

  1. It is always cheaper/faster to append to a list and create a DataFrame in one go.附加到列表并一次性创建 DataFrame 总是更便宜/更快。
  2. Lists take up less memory and are a much lighter data structure to work with, append, and remove.列表占用更少的内存,并且是一种更轻的数据结构,可用于处理、追加和删除。
  3. dtypes are automatically inferred for your data. dtypes会自动为您的数据推断出来。 On the flip side, creating an empty frame of NaNs will automatically make them object , which is bad.另一方面,创建一个空的 NaN 框架会自动使它们成为object ,这很糟糕。
  4. An index is automatically created for you, instead of you having to take care to assign the correct index to the row you are appending.系统会自动为您创建一个索引,而您不必小心为要附加的行分配正确的索引。

This is The Right Way™ to accumulate your data这是积累数据的正确方法™

data = []
for a, b, c in some_function_that_yields_data():
    data.append([a, b, c])

df = pd.DataFrame(data, columns=['A', 'B', 'C'])

These options are horrible这些选项太可怕了

  1. append or concat inside a loop在循环内appendconcat

    append and concat aren't inherently bad in isolation . appendconcat本身并不是孤立的 The problem starts when you iteratively call them inside a loop - this results in quadratic memory usage.当您在循环内迭代调用它们时,问题就开始了 - 这会导致二次内存使用。

     # Creates empty DataFrame and appends df = pd.DataFrame(columns=['A', 'B', 'C']) for a, b, c in some_function_that_yields_data(): df = df.append({'A': i, 'B': b, 'C': c}, ignore_index=True) # This is equally bad: # df = pd.concat( # [df, pd.Series({'A': i, 'B': b, 'C': c})], # ignore_index=True)
  2. Empty DataFrame of NaNs NaN 的空数据帧

    Never create a DataFrame of NaNs as the columns are initialized with object (slow, un-vectorizable dtype).永远不要创建 NaN 的 DataFrame,因为列是用object (缓慢的、不可矢量化的 dtype)初始化的。

     # Creates DataFrame of NaNs and overwrites values. df = pd.DataFrame(columns=['A', 'B', 'C'], index=range(5)) for a, b, c in some_function_that_yields_data(): df.loc[len(df)] = [a, b, c]

The Proof is in the Pudding证据就在布丁里

Timing these methods is the fastest way to see just how much they differ in terms of their memory and utility.对这些方法进行计时是查看它们在内存和效用方面有多大差异的最快方法。

在此处输入图片说明

Benchmarking code for reference. 基准代码供参考。


It's posts like this that remind me why I'm a part of this community.像这样的帖子提醒我为什么我是这个社区的一员。 People understand the importance of teaching folks getting the right answer with the right code, not the right answer with wrong code.人们明白教人们用正确的代码得到正确答案的重要性,而不是用错误的代码得到正确答案的重要性。 Now you might argue that it is not an issue to use loc or append if you're only adding a single row to your DataFrame.现在您可能会争辩说,如果您只向 DataFrame 添加一行,则使用locappend不是问题。 However, people often look to this question to add more than just one row - often the requirement is to iteratively add a row inside a loop using data that comes from a function (see related question ).然而,人们往往看这个问题补充不止一行-通常要求是反复添加使用来自一个函数的数据在一个循环内一排(见相关的问题)。 In that case it is important to understand that iteratively growing a DataFrame is not a good idea.在这种情况下,重要的是要了解迭代增长 DataFrame 不是一个好主意。

If you know the number of entries ex ante, you should preallocate the space by also providing the index (taking the data example from a different answer):如果您事先知道条目的数量,您应该通过提供索引来预先分配空间(以不同答案中的数据为例):

import pandas as pd
import numpy as np
# we know we're gonna have 5 rows of data
numberOfRows = 5
# create dataframe
df = pd.DataFrame(index=np.arange(0, numberOfRows), columns=('lib', 'qty1', 'qty2') )

# now fill it up row by row
for x in np.arange(0, numberOfRows):
    #loc or iloc both work here since the index is natural numbers
    df.loc[x] = [np.random.randint(-1,1) for n in range(3)]
In[23]: df
Out[23]: 
   lib  qty1  qty2
0   -1    -1    -1
1    0     0     0
2   -1     0    -1
3    0    -1     0
4   -1     0     0

Speed comparison速度对比

In[30]: %timeit tryThis() # function wrapper for this answer
In[31]: %timeit tryOther() # function wrapper without index (see, for example, @fred)
1000 loops, best of 3: 1.23 ms per loop
100 loops, best of 3: 2.31 ms per loop

And - as from the comments - with a size of 6000, the speed difference becomes even larger:并且 - 从评论来看 - 大小为 6000,速度差异变得更大:

Increasing the size of the array (12) and the number of rows (500) makes the speed difference more striking: 313ms vs 2.29s增加数组的大小(12)和行数(500)使速度差异更加显着:313ms vs 2.29s

mycolumns = ['A', 'B']
df = pd.DataFrame(columns=mycolumns)
rows = [[1,2],[3,4],[5,6]]
for row in rows:
    df.loc[len(df)] = row

You can append a single row as a dictionary using the ignore_index option.您可以使用ignore_index选项将单行附加为字典。

>>> f = pandas.DataFrame(data = {'Animal':['cow','horse'], 'Color':['blue', 'red']})
>>> f
  Animal Color
0    cow  blue
1  horse   red
>>> f.append({'Animal':'mouse', 'Color':'black'}, ignore_index=True)
  Animal  Color
0    cow   blue
1  horse    red
2  mouse  black

For efficient appending, see How to add an extra row to a pandas dataframe and Setting With Enlargement .为了有效地追加,请参阅如何向 Pandas 数据帧添加额外的行使用放大设置

Add rows through loc/ix on non existing key index data.通过loc/ix不存在的键索引数据上添加行。 For example:例如:

In [1]: se = pd.Series([1,2,3])

In [2]: se
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: se[5] = 5.

In [4]: se
Out[4]:
0    1.0
1    2.0
2    3.0
5    5.0
dtype: float64

Or:或者:

In [1]: dfi = pd.DataFrame(np.arange(6).reshape(3,2),
   .....:                 columns=['A','B'])
   .....:

In [2]: dfi
Out[2]:
   A  B
0  0  1
1  2  3
2  4  5

In [3]: dfi.loc[:,'C'] = dfi.loc[:,'A']

In [4]: dfi
Out[4]:
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
In [5]: dfi.loc[3] = 5

In [6]: dfi
Out[6]:
   A  B  C
0  0  1  0
1  2  3  2
2  4  5  4
3  5  5  5

For the sake of a Pythonic way:为了 Pythonic 的方式:

res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
res = res.append([{'qty1':10.0}], ignore_index=True)
print(res.head())

   lib  qty1  qty2
0  NaN  10.0   NaN

You can also build up a list of lists and convert it to a dataframe -您还可以建立一个列表列表并将其转换为数据框 -

import pandas as pd

columns = ['i','double','square']
rows = []

for i in range(6):
    row = [i, i*2, i*i]
    rows.append(row)

df = pd.DataFrame(rows, columns=columns)

giving给予

i   double  square
0   0   0   0
1   1   2   1
2   2   4   4
3   3   6   9
4   4   8   16
5   5   10  25

I figured out a simple and nice way:我想出了一个简单而好的方法:

>>> df
     A  B  C
one  1  2  3
>>> df.loc["two"] = [4,5,6]
>>> df
     A  B  C
one  1  2  3
two  4  5  6

Note the caveat with performance as noted in the comments.请注意评论中指出的性能警告。

This is not an answer to the OP question, but a toy example to illustrate ShikharDua's answer which I found very useful.这不是 OP 问题的答案,而是一个玩具示例来说明ShikharDua 的答案,我发现它非常有用。

While this fragment is trivial, in the actual data I had 1,000s of rows, and many columns, and I wished to be able to group by different columns and then perform the statistics below for more than one target column.虽然这个片段是微不足道的,但在实际数据中,我有 1,000 行和许多列,我希望能够按不同的列进行分组,然后对多个目标列执行下面的统计。 So having a reliable method for building the data frame one row at a time was a great convenience.因此,有一种可靠的方法来一次一行地构建数据框是非常方便的。 Thank you ShikharDua!谢谢 ShikharDua!

import pandas as pd

BaseData = pd.DataFrame({ 'Customer' : ['Acme','Mega','Acme','Acme','Mega','Acme'],
                          'Territory'  : ['West','East','South','West','East','South'],
                          'Product'  : ['Econ','Luxe','Econ','Std','Std','Econ']})
BaseData

columns = ['Customer','Num Unique Products', 'List Unique Products']

rows_list=[]
for name, group in BaseData.groupby('Customer'):
    RecordtoAdd={} #initialise an empty dict
    RecordtoAdd.update({'Customer' : name}) #
    RecordtoAdd.update({'Num Unique Products' : len(pd.unique(group['Product']))})
    RecordtoAdd.update({'List Unique Products' : pd.unique(group['Product'])})

    rows_list.append(RecordtoAdd)

AnalysedData = pd.DataFrame(rows_list)

print('Base Data : \n',BaseData,'\n\n Analysed Data : \n',AnalysedData)

You can use a generator object to create a Dataframe, which will be more memory efficient over the list.您可以使用生成器对象来创建数据帧,这将在列表中提高内存效率。

num = 10

# Generator function to generate generator object
def numgen_func(num):
    for i in range(num):
        yield ('name_{}'.format(i), (i*i), (i*i*i))

# Generator expression to generate generator object (Only once data get populated, can not be re used)
numgen_expression = (('name_{}'.format(i), (i*i), (i*i*i)) for i in range(num) )

df = pd.DataFrame(data=numgen_func(num), columns=('lib', 'qty1', 'qty2'))

To add raw to existing DataFrame you can use append method.要将原始数据添加到现有 DataFrame,您可以使用 append 方法。

df = df.append([{ 'lib': "name_20", 'qty1': 20, 'qty2': 400  }])

Create a new record (data frame) and add to old_data_frame .创建一个新记录(数据框)并添加到old_data_frame

Pass a list of values and the corresponding column names to create a new_record (data_frame):传递列表和相应的名以创建new_record (data_frame):

new_record = pd.DataFrame([[0, 'abcd', 0, 1, 123]], columns=['a', 'b', 'c', 'd', 'e'])

old_data_frame = pd.concat([old_data_frame, new_record])

如果您总是想在最后添加一个新行,请使用以下命令:

df.loc[len(df)] = ['name5', 9, 0]

Here is the way to add/append a row in a Pandas DataFrame :这是在 Pandas DataFrame添加/追加一行的DataFrame

def add_row(df, row):
    df.loc[-1] = row
    df.index = df.index + 1
    return df.sort_index()

add_row(df, [1,2,3])

It can be used to insert/append a row in an empty or populated Pandas DataFrame.它可用于在空的或填充的 Pandas DataFrame 中插入/追加一行。

Instead of a list of dictionaries as in ShikharDua's answer , we can also represent our table as a dictionary of lists , where each list stores one column in row-order, given we know our columns beforehand.ShikharDua 的回答中的字典列表不同,我们还可以将我们的表表示为列表字典,其中每个列表按行顺序存储一列, 前提是我们事先知道我们的列。 At the end we construct our DataFrame once.最后,我们构造了一次 DataFrame。

For c columns and n rows, this uses one dictionary and c lists, versus one list and n dictionaries.对于c列和n行,这使用一个字典和c 个列表,而不是一个列表和n 个字典。 The list-of-dictionaries method has each dictionary storing all keys and requires creating a new dictionary for every row. list-of-dictionaries 方法让每个字典存储所有键,并且需要为每一行创建一个新字典。 Here we only append to lists, which is constant time and theoretically very fast.这里我们只附加到列表,这是恒定的时间,理论上非常快。

# Current data
data = {"Animal":["cow", "horse"], "Color":["blue", "red"]}

# Adding a new row (be careful to ensure every column gets another value)
data["Animal"].append("mouse")
data["Color"].append("black")

# At the end, construct our DataFrame
df = pd.DataFrame(data)
#   Animal  Color
# 0    cow   blue
# 1  horse    red
# 2  mouse  black

If you want to add a row at the end, append it as a list:如果要在最后添加一行,请将其附加为列表:

valuestoappend = [va1, val2, val3]
res = res.append(pd.Series(valuestoappend, index = ['lib', 'qty1', 'qty2']), ignore_index = True)

Another way to do it (probably not very performant):另一种方法(可能不是很高效):

# add a row
def add_row(df, row):
    colnames = list(df.columns)
    ncol = len(colnames)
    assert ncol == len(row), "Length of row must be the same as width of DataFrame: %s" % row
    return df.append(pd.DataFrame([row], columns=colnames))

You can also enhance the DataFrame class like this:您还可以像这样增强 DataFrame 类:

import pandas as pd
def add_row(self, row):
    self.loc[len(self.index)] = row
pd.DataFrame.add_row = add_row

All you need is loc[df.shape[0]] or loc[len(df)]您只需要loc[df.shape[0]]loc[len(df)]


# Assuming your df has 4 columns (str, int, str, bool)
df.loc[df.shape[0]] = ['col1Value', 100, 'col3Value', False] 

or或者

df.loc[len(df)] = ['col1Value', 100, 'col3Value', False] 
initial_data = {'lib': np.array([1,2,3,4]), 'qty1': [1,2,3,4], 'qty2': [1,2,3,4]}

df = pd.DataFrame(initial_data)

df

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4

val_1 = [10]
val_2 = [14]
val_3 = [20]

df.append(pd.DataFrame({'lib': val_1, 'qty1': val_2, 'qty2': val_3}))

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4
0    10    14    20

You can use a for loop to iterate through values or can add arrays of values.您可以使用for循环遍历值或添加值数组。

val_1 = [10, 11, 12, 13]
val_2 = [14, 15, 16, 17]
val_3 = [20, 21, 22, 43]

df.append(pd.DataFrame({'lib': val_1, 'qty1': val_2, 'qty2': val_3}))

lib    qty1    qty2
0    1    1    1
1    2    2    2
2    3    3    3
3    4    4    4
0    10    14    20
1    11    15    21
2    12    16    22
3    13    17    43

Make it simple.让它变得简单。 By taking a list as input which will be appended as a row in the data-frame:通过将列表作为输入,该列表将作为一行附加到数据框中:

import pandas as pd
res = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
for i in range(5):
    res_list = list(map(int, input().split()))
    res = res.append(pd.Series(res_list, index=['lib', 'qty1', 'qty2']), ignore_index=True)

You can concatenate two DataFrames for this.您可以为此连接两个 DataFrame。 I basically came across this problem to add a new row to an existing DataFrame with a character index (not numeric).我基本上遇到了这个问题,即使用字符索引(不是数字)向现有 DataFrame 添加新行。

So, I input the data for a new row in a duct() and index in a list.因此,我在 duct() 中输入新行的数据并在列表中索引。

new_dict = {put input for new row here}
new_list = [put your index here]

new_df = pd.DataFrame(data=new_dict, index=new_list)

df = pd.concat([existing_df, new_df])

We often see the construct df.loc[subscript] = … to assign to one DataFrame row.我们经常看到构造df.loc[subscript] = …分配给一个 DataFrame 行。 Mikhail_Sam posted benchmarks containing, among others, this construct as well as the method using dict and create DataFrame in the end . Mikhail_Sam 发布了基准测试,其中包含此构造以及使用dict 和最后创建 DataFrame的方法。 He found the latter to be the fastest by far.他发现后者是迄今为止最快的。

But if we replace the df3.loc[i] = … (with preallocated DataFrame) in his code with df3.values[i] = … , the outcome changes significantly, in that that method performs similar to the one using dict.但是,如果我们更换df3.loc[i] = … (与预先分配的数据帧)在他的代码df3.values[i] = … ,结果显著的变化,在使用字典类似于一个方法执行。 So we should more often take the use of df.values[subscript] = … into consideration.所以我们应该更多地考虑使用df.values[subscript] = … However note that .values takes a zero-based subscript, which may be different from the DataFrame.index.但是请注意, .values采用从零开始的下标,这可能与 DataFrame.index 不同。

pandas.DataFrame.append pandas.DataFrame.append

DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → 'DataFrame' DataFrame.append(self, other, ignore_index=False, verify_integrity=False, sort=False) → 'DataFrame'

Code代码

df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
df.append(df2)

With ignore_index set to True:将 ignore_index 设置为 True:

df.append(df2, ignore_index=True)

If all data in your Dataframe has the same dtype you might use a NumPy array.如果 Dataframe 中的所有数据都具有相同的 dtype,则可以使用 NumPy 数组。 You can write rows directly into the predefined array and convert it to a dataframe at the end.您可以将行直接写入预定义的数组,并在最后将其转换为数据帧。 It seems to be even faster than converting a list of dicts.它似乎比转换字典列表还要快。

import pandas as pd
import numpy as np
from string import ascii_uppercase

startTime = time.perf_counter()
numcols, numrows = 5, 10000
npdf = np.ones((numrows, numcols))
for row in range(numrows):
    npdf[row, 0:] = np.random.randint(0, 100, (1, numcols))
df5 = pd.DataFrame(npdf, columns=list(ascii_uppercase[:numcols]))
print('Elapsed time: {:6.3f} seconds for {:d} rows'.format(time.perf_counter() - startTime, numOfRows))
print(df5.shape)

If you have a data frame df and want to add a list new_list as a new row to df , you can simply do:如果您有一个数据框df并且想要将列表new_list添加为df的新行,您可以简单地执行以下操作:

df.loc[len(df)] = new_list

If you want to add a new data frame new_df under data frame df , then you can use:如果要在数据框df下添加新的数据框new_df ,则可以使用:

df.append(new_df)

This code snippet uses a list of dictionaries to update the data frame.此代码片段使用字典列表来更新数据框。 It adds on to ShikharDua's and Mikhail_Sam's answers.它补充了ShikharDuaMikhail_Sam 的答案。

import pandas as pd
colour = ["red", "big", "tasty"]
fruits = ["apple", "banana", "cherry"]
dict1={}
feat_list=[]
for x in colour:
    for y in fruits:
#         print(x, y)
        dict1 = dict([('x',x),('y',y)])
#         print(f'dict 1 {dict1}')
        feat_list.append(dict1)
#         print(f'feat_list {feat_list}')
feat_df=pd.DataFrame(feat_list)
feat_df.to_csv('feat1.csv')
import pandas as pd 
t1=pd.DataFrame()
for i in range(len(the number of rows)):
    #add rows as columns
    t1[i]=list(rows)
t1=t1.transpose()
t1.columns=list(columns)

Before going to add a row, we have to convert the dataframe to a dictionary.在添加一行之前,我们必须将数据帧转换为字典。 There you can see the keys as columns in the dataframe and the values of the columns are again stored in the dictionary, but there the key for every column is the index number in the dataframe.在那里你可以看到键作为数据帧中的列,列的值再次存储在字典中,但每列的键是数据帧中的索引号。

That idea makes me to write the below code.这个想法让我写了下面的代码。

df2 = df.to_dict()
values = ["s_101", "hyderabad", 10, 20, 16, 13, 15, 12, 12, 13, 25, 26, 25, 27, "good", "bad"] # This is the total row that we are going to add
i = 0
for x in df.columns:   # Here df.columns gives us the main dictionary key
    df2[x][101] = values[i]   # Here the 101 is our index number. It is also the key of the sub dictionary
    i += 1

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by one .我知道DataFrame旨在加载完全填充的DataFrame但我需要创建一个空的 DataFrame,然后逐行添加行 What is the best way to do this ?做这个的最好方式是什么 ?

I successfully created an empty DataFrame with :我成功地创建了一个空的 DataFrame :

res = DataFrame(columns=('lib', 'qty1', 'qty2'))

Then I can add a new row and fill a field with :然后我可以添加一个新行并填充一个字段:

res = res.set_value(len(res), 'qty1', 10.0)

It works but seems very odd :-/ (it fails for adding string value)它有效,但似乎很奇怪:-/(添加字符串值失败)

How can I add a new row to my DataFrame (with different columns type) ?如何向我的 DataFrame 添加新行(具有不同的列类型)?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM