简体   繁体   English

添加不同长度的列 pandas

[英]add columns different length pandas

I have a problem with adding columns in pandas. I have DataFrame, dimensional is nxk.我在pandas中添加列有问题。我有DataFrame,维度是nxk。 And in process I wiil need add columns with dimensional mx1, where m = [1,n], but I don't know m.在这个过程中,我需要添加维度为 mx1 的列,其中 m = [1,n],但我不知道 m。

When I try do it:当我尝试这样做时:

df['Name column'] = data    
# type(data) = list

result:结果:

AssertionError: Length of values does not match length of index   

Can I add columns with different length?我可以添加不同长度的列吗?

If you use accepted answer, you'll lose your column names, as shown in the accepted answer example , and described in the documentation (emphasis added):如果您使用已接受的答案,您将丢失列名,如已接受的答案示例中所示,并在文档中进行了描述(强调已添加):

The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information.产生的轴将被标记为0,...,N - 1。这如果你是哪里串联串列轴线没有意义的索引信息的对象是非常有用的。

It looks like column names ( 'Name column' ) are meaningful to the Original Poster / Original Question.看起来列名( 'Name column' )对原始海报/原始问题有意义。

To save column names, use pandas.concat , but don't ignore_index (default value of ignore_index is false ; so you can omit that argument altogether).为了节省列名,使用pandas.concat ,但不要ignore_index (默认值ignore_indexfalse ,所以您完全可以忽略这样的说法)。 Continue to use axis=1 :继续使用axis=1

import pandas

# Note these columns have 3 rows of values:
original = pandas.DataFrame({
    'Age':[10, 12, 13], 
    'Gender':['M','F','F']
})

# Note this column has 4 rows of values:
additional = pandas.DataFrame({
    'Name': ['Nate A', 'Jessie A', 'Daniel H', 'John D']
})

new = pandas.concat([original, additional], axis=1) 
# Identical:
# new = pandas.concat([original, additional], ignore_index=False, axis=1) 

print(new.head())

#          Age        Gender        Name
#0          10             M      Nate A
#1          12             F    Jessie A
#2          13             F    Daniel H
#3         NaN           NaN      John D

Notice how John D does not have an Age or a Gender.请注意 John D 是如何没有 Age 或 Gender 的。

Use concat and pass axis=1 and ignore_index=True :使用 concat 并传递axis=1ignore_index=True

In [38]:

import numpy as np
df = pd.DataFrame({'a':np.arange(5)})
df1 = pd.DataFrame({'b':np.arange(4)})
print(df1)
df
   b
0  0
1  1
2  2
3  3
Out[38]:
   a
0  0
1  1
2  2
3  3
4  4
In [39]:

pd.concat([df,df1], ignore_index=True, axis=1)
Out[39]:
   0   1
0  0   0
1  1   1
2  2   2
3  3   3
4  4 NaN

We can add the different size of list values to DataFrame.我们可以将不同大小的列表值添加到 DataFrame。

Example例子

a = [0,1,2,3]
b = [0,1,2,3,4,5,6,7,8,9]
c = [0,1]

Find the Length of all list查找所有列表的长度

la,lb,lc = len(a),len(b),len(c)
# now find the max
max_len = max(la,lb,lc)

Resize all according to the determined max length (not in this example根据确定的最大长度调整所有大小(不在此示例中

if not max_len == la:
  a.extend(['']*(max_len-la))
if not max_len == lb:
  b.extend(['']*(max_len-lb))
if not max_len == lc:
  c.extend(['']*(max_len-lc))

Now the all list is same length and create dataframe现在所有列表的长度相同并创建数据框

pd.DataFrame({'A':a,'B':b,'C':c}) 

Final Output is最终输出是

   A  B  C
0  1  0  1
1  2  1   
2  3  2   
3     3   
4     4   
5     5   
6     6   
7     7   
8     8   
9     9  

I had the same issue, two different dataframes and without a common column.我有同样的问题,两个不同的数据框,没有一个公共列。 I just needed to put them beside each other in a csv file.我只需要将它们并排放在一个 csv 文件中。

  • Merge: In this case, "merge" does not work;合并:在这种情况下,“合并”不起作用; even adding a temporary column to both dfs and then dropping it.甚至向两个 dfs 添加一个临时列然后删除它。 Because this method makes both dfs with the same length.因为这种方法使两个dfs具有相同的长度。 Hence, it repeats the rows of the shorter dataframe to match the longer dataframe's length.因此,它重复较短数据帧的行以匹配较长数据帧的长度。
  • Concat: The idea of The Red Pea didn't work for me. Concat:The Red Pea的想法对我不起作用。 It just appended the shorter df to the longer one (row-wise) while leaving an empty column (NaNs) above the shorter df's column.它只是将较短的 df 附加到较长的 df(按行),同时在较短的 df 列上方留下一个空列(NaN)。
  • Solution : You need to do the following:解决方案:您需要执行以下操作:
df1 = df1.reset_index()
df2 = df2.reset_index()
df = [df1, df2]
df_final = pd.concat(df, axis=1)

df_final.to_csv(filename, index=False)

This way, you'll see your dfs besides each other (column-wise), each of which with its own length.这样,你会看到你的dfs彼此dfs (按列),每个都有自己的长度。

If somebody like to replace a specific column of a different size instead of adding it.如果有人喜欢替换不同大小的特定列而不是添加它。

Based on this answer, I use a dict as an intermediate type.基于这个答案,我使用 dict 作为中间类型。 Create Pandas Dataframe with different sized columns 使用不同大小的列创建 Pandas Dataframe

If the column to be inserted is not a list but already a dict, the respective line can be omitted.如果要插入的列不是列表而是已经是字典,则可以省略相应的行。

def fill_column(dataframe: pd.DataFrame, list: list, column: str):
    dict_from_list = dict(enumerate(list)) # create enumertable object from list and create dict

    dataFrame_asDict = dataframe.to_dict() # Get DataFrame as Dict
    dataFrame_asDict[column] = dict_from_list # Assign specific column

    return pd.DataFrame.from_dict(dataFrame_asDict, orient='index').T # Create new DataSheet from Dict and return it

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM