简体   繁体   English

熊猫-将值设置为空数据框

[英]Pandas- set values to an empty dataframe

I have initialized an empty pandas dataframe that I am now trying to fill but I keep running into the same error. 我已经初始化了一个空的熊猫数据框,现在尝试填充它,但是我一直遇到相同的错误。 This is the (simplified) code I am using 这是我正在使用的(简化)代码

import pandas as pd
cols = list("ABC")
df = pd.DataFrame(columns=cols)
# sett the values for the first two rows
df.loc[0:2,:] = [[1,2],[3,4],[5,6]]

On running the above code I get the following error: 在运行上面的代码时,出现以下错误:

ValueError: cannot copy sequence with size 3 to array axis with dimension 0

I am not sure whats causing this. 我不确定是什么原因造成的。 I tried the same using a single row at a time and it works ( df.loc[0,:] = [1,2,3] ). 我一次只使用一行尝试了相同的操作,并且可以正常工作( df.loc[0,:] = [1,2,3] )。 I thought this should be the logical expansion when I want to handle more than one rows. 我想当我要处理多个行时,这应该是逻辑扩展。 But clearly, I am wrong. 但是很明显,我错了。 Whats the correct way to do this? 什么是正确的方法? I need to enter values for multiple rows and columns and once. 我需要一次输入多个行和列的值。 I can do it using a loop but that's not what I am looking for. 我可以使用循环来做到这一点,但这不是我想要的。

Any help would be great. 任何帮助都会很棒。 Thanks 谢谢

Since you have the columns from empty dataframe use it in dataframe constructor ie 由于您有来自空数据框的列,请在数据框构造函数中使用它,即

import pandas as pd
cols = list("ABC")
df = pd.DataFrame(columns=cols)

df = pd.DataFrame(np.array([[1,2],[3,4],[5,6]]).T,columns=df.columns) 

   A  B  C
0  1  3  5
1  2  4  6

Well, if you want to use loc specifically then, reindex the dataframe first then assign ie 好吧,如果您想专门使用loc,请先重新索引数据框,然后分配即

arr = np.array([[1,2],[3,4],[5,6]]).T
df = df.reindex(np.arange(arr.shape[0]))
df.loc[0:arr.shape[0],:] = arr

   A  B  C
0  1  3  5
1  2  4  6

How about adding data by index as below. 如何通过索引添加数据,如下所示。 You can add externally to a function as and when you receive data. 您可以在接收数据时以及在外部将其添加到功能中。

def add_to_df(index, data):
    for idx,i in zip(index,(zip(*data))):
        df.loc[idx]=i

#Set values for first two rows
data1 = [[1,2],[3,4],[5,6]]
index1 = [0,1]
add_to_df(index1, data1)
print df
print ""

#Set values for next three rows
data2 = [[7,8,9],[10,11,12],[13,14,15]]
index2 = [2,3,4]
add_to_df(index2, data2)
print df

Result 结果

>>> 
     A    B    C
0  1.0  3.0  5.0
1  2.0  4.0  6.0

     A     B     C
0  1.0   3.0   5.0
1  2.0   4.0   6.0
2  7.0  10.0  13.0
3  8.0  11.0  14.0
4  9.0  12.0  15.0
>>> 

Seeing through the documentation and some experiments, my guess is that loc only allows you to insert 1 key at a time. 通过查看文档和一些实验,我猜想loc仅允许您一次插入1个密钥。 However, you can insert multiple keys first with reindex as @Dark shows. 但是,您可以先插入带有reindex多个键,如@Dark所示。

The .loc/[] operations can perform enlargement when setting a non-existent key for that axis. 为该轴设置不存在的键时,.loc / []操作可以执行放大。

http://pandas-docs.github.io/pandas-docs-travis/indexing.html#setting-with-enlargement http://pandas-docs.github.io/pandas-docs-travis/indexing.html#setting-with-enlargement

Also, while you are using loc[:2, :] , you mean you want to select the first two rows. 另外,当您使用loc[:2, :] ,表示要选择前两行。 However, there is nothing in the empty df for you to select. 但是,空df中没有任何内容可供您选择。 There is no rows while you are trying to insert 3 rows. 尝试插入3行时没有行。 Thus, the message gives 因此,该消息给出了

ValueError: cannot copy sequence with size 3 to array axis with dimension 0

BTW, [[1,2],[3,4],[5,6]] will be 3 rows rather than 2. 顺便说一句, [[1,2],[3,4],[5,6]]将是3行而不是2行。

Does this get the output you looking for: 这是否获得您想要的输出:

   import pandas as pd
   df=pd.DataFrame({'A':[1,2],'B':[3,4],'C':[5,6]})

Output : 输出:

    A B C
  0 1 3 5
  1 2 4 6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM