[英]add columns different length pandas
I have a problem with adding columns in pandas. I have DataFrame, dimensional is nxk.我在pandas中添加列有问题。我有DataFrame,维度是nxk。 And in process I wiil need add columns with dimensional mx1, where m = [1,n], but I don't know m.
在这个过程中,我需要添加维度为 mx1 的列,其中 m = [1,n],但我不知道 m。
When I try do it:当我尝试这样做时:
df['Name column'] = data
# type(data) = list
result:结果:
AssertionError: Length of values does not match length of index
Can I add columns with different length?我可以添加不同长度的列吗?
If you use accepted answer, you'll lose your column names, as shown in the accepted answer example , and described in the documentation (emphasis added):如果您使用已接受的答案,您将丢失列名,如已接受的答案示例中所示,并在文档中进行了描述(强调已添加):
The resulting axis will be labeled 0, ..., n - 1. This is useful if you are concatenating objects where the concatenation axis does not have meaningful indexing information.
产生的轴将被标记为0,...,N - 1。这如果你是哪里串联串列轴线没有意义的索引信息的对象是非常有用的。
It looks like column names ( 'Name column'
) are meaningful to the Original Poster / Original Question.看起来列名(
'Name column'
)对原始海报/原始问题有意义。
To save column names, use pandas.concat
, but don't ignore_index
(default value of ignore_index
is false
; so you can omit that argument altogether).为了节省列名,使用
pandas.concat
,但不要ignore_index
(默认值ignore_index
是false
,所以您完全可以忽略这样的说法)。 Continue to use axis=1
:继续使用
axis=1
:
import pandas
# Note these columns have 3 rows of values:
original = pandas.DataFrame({
'Age':[10, 12, 13],
'Gender':['M','F','F']
})
# Note this column has 4 rows of values:
additional = pandas.DataFrame({
'Name': ['Nate A', 'Jessie A', 'Daniel H', 'John D']
})
new = pandas.concat([original, additional], axis=1)
# Identical:
# new = pandas.concat([original, additional], ignore_index=False, axis=1)
print(new.head())
# Age Gender Name
#0 10 M Nate A
#1 12 F Jessie A
#2 13 F Daniel H
#3 NaN NaN John D
Notice how John D does not have an Age or a Gender.请注意 John D 是如何没有 Age 或 Gender 的。
Use concat and pass axis=1
and ignore_index=True
:使用 concat 并传递
axis=1
和ignore_index=True
:
In [38]:
import numpy as np
df = pd.DataFrame({'a':np.arange(5)})
df1 = pd.DataFrame({'b':np.arange(4)})
print(df1)
df
b
0 0
1 1
2 2
3 3
Out[38]:
a
0 0
1 1
2 2
3 3
4 4
In [39]:
pd.concat([df,df1], ignore_index=True, axis=1)
Out[39]:
0 1
0 0 0
1 1 1
2 2 2
3 3 3
4 4 NaN
We can add the different size of list values to DataFrame.我们可以将不同大小的列表值添加到 DataFrame。
Example
例子
a = [0,1,2,3]
b = [0,1,2,3,4,5,6,7,8,9]
c = [0,1]
Find the Length of all list
查找所有列表的长度
la,lb,lc = len(a),len(b),len(c)
# now find the max
max_len = max(la,lb,lc)
Resize all according to the determined max length (not in this example
根据确定的最大长度调整所有大小(不在此示例中
if not max_len == la:
a.extend(['']*(max_len-la))
if not max_len == lb:
b.extend(['']*(max_len-lb))
if not max_len == lc:
c.extend(['']*(max_len-lc))
Now the all list is same length and create dataframe
现在所有列表的长度相同并创建数据框
pd.DataFrame({'A':a,'B':b,'C':c})
Final Output is
最终输出是
A B C
0 1 0 1
1 2 1
2 3 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
I had the same issue, two different dataframes and without a common column.我有同样的问题,两个不同的数据框,没有一个公共列。 I just needed to put them beside each other in a csv file.
我只需要将它们并排放在一个 csv 文件中。
df1 = df1.reset_index()
df2 = df2.reset_index()
df = [df1, df2]
df_final = pd.concat(df, axis=1)
df_final.to_csv(filename, index=False)
This way, you'll see your dfs
besides each other (column-wise), each of which with its own length.这样,你会看到你的
dfs
彼此dfs
(按列),每个都有自己的长度。
If somebody like to replace a specific column of a different size instead of adding it.如果有人喜欢替换不同大小的特定列而不是添加它。
Based on this answer, I use a dict as an intermediate type.基于这个答案,我使用 dict 作为中间类型。 Create Pandas Dataframe with different sized columns
使用不同大小的列创建 Pandas Dataframe
If the column to be inserted is not a list but already a dict, the respective line can be omitted.如果要插入的列不是列表而是已经是字典,则可以省略相应的行。
def fill_column(dataframe: pd.DataFrame, list: list, column: str):
dict_from_list = dict(enumerate(list)) # create enumertable object from list and create dict
dataFrame_asDict = dataframe.to_dict() # Get DataFrame as Dict
dataFrame_asDict[column] = dict_from_list # Assign specific column
return pd.DataFrame.from_dict(dataFrame_asDict, orient='index').T # Create new DataSheet from Dict and return it
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.