简体   繁体   English

如何向现有的 DataFrame 添加新列?

[英]How to add a new column to an existing DataFrame?

I have the following indexed DataFrame with named columns and rows not- continuous numbers:我有以下索引 DataFrame 与命名列和行不连续的数字:

          a         b         c         d
2  0.671399  0.101208 -0.181532  0.241273
3  0.446172 -0.243316  0.051767  1.577318
5  0.614758  0.075793 -0.451460 -0.012493

I would like to add a new column, 'e' , to the existing data frame and do not want to change anything in the data frame (ie, the new column always has the same length as the DataFrame).我想在现有数据框中添加一个新列'e' ,并且不想更改数据框中的任何内容(即,新列始终与 DataFrame 具有相同的长度)。

0   -0.335485
1   -1.166658
2   -0.385571
dtype: float64

How can I add column e to the above example?如何将e列添加到上面的示例中?

Edit 2017编辑 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign :正如评论和@Alexander 所指出的,目前将 Series 的值添​​加为 DataFrame 的新列的最佳方法可能是使用assign

df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)

Edit 2015编辑 2015
Some reported getting the SettingWithCopyWarning with this code.有些人报告说使用此代码获得了SettingWithCopyWarning
However, the code still runs perfectly with the current pandas version 0.16.1.但是,该代码仍然可以在当前的 pandas 0.16.1 版本中完美运行。

>>> sLength = len(df1['a'])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> pd.version.short_version
'0.16.1'

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. SettingWithCopyWarning旨在通知数据帧副本上可能无效的分配。 It doesn't necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose.它不一定说你做错了(它可能会触发误报),但从 0.13.0 开始,它让你知道有更合适的方法用于相同的目的。 Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead然后,如果您收到警告,请遵循其建议:尝试使用 .loc[row_index,col_indexer] = value 代替

>>> df1.loc[:,'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

In fact, this is currently the more efficient method as described in pandas docs事实上,这是目前熊猫文档中描述的更有效的方法


Original answer:原答案:

Use the original df1 indexes to create the series:使用原始 df1 索引创建系列:

df1['e'] = pd.Series(np.random.randn(sLength), index=df1.index)

这是添加新列的简单方法: df['e'] = e

I would like to add a new column, 'e', to the existing data frame and do not change anything in the data frame.我想在现有数据框中添加一个新列“e”,并且不要更改数据框中的任何内容。 (The series always got the same length as a dataframe.) (该系列的长度始终与数据框相同。)

I assume that the index values in e match those in df1 .我假设e中的索引值与df1中的索引值匹配。

The easiest way to initiate a new column named e , and assign it the values from your series e :启动名为e的新列的最简单方法,并为其分配系列e中的值:

df['e'] = e.values

assign (Pandas 0.16.0+)分配(熊猫 0.16.0+)

As of Pandas 0.16.0, you can also use assign , which assigns new columns to a DataFrame and returns a new object (a copy) with all the original columns in addition to the new ones.从 Pandas 0.16.0 开始,您还可以使用assign ,它将新列分配给 DataFrame 并返回一个新对象(副本),其中包含除新列之外的所有原始列。

df1 = df1.assign(e=e.values)

As per this example (which also includes the source code of the assign function), you can also include more than one column:根据此示例(其中还包括assign函数的源代码),您还可以包含多个列:

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.assign(mean_a=df.a.mean(), mean_b=df.b.mean())
   a  b  mean_a  mean_b
0  1  3     1.5     3.5
1  2  4     1.5     3.5

In context with your example:在您的示例中:

np.random.seed(0)
df1 = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd'])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1['a'])
e = pd.Series(np.random.randn(sLength))

>>> df1
          a         b         c         d
0  1.764052  0.400157  0.978738  2.240893
2 -0.103219  0.410599  0.144044  1.454274
3  0.761038  0.121675  0.443863  0.333674
7  1.532779  1.469359  0.154947  0.378163
9  1.230291  1.202380 -0.387327 -0.302303

>>> e
0   -1.048553
1   -1.420018
2   -1.706270
3    1.950775
4   -0.509652
dtype: float64

df1 = df1.assign(e=e.values)

>>> df1
          a         b         c         d         e
0  1.764052  0.400157  0.978738  2.240893 -1.048553
2 -0.103219  0.410599  0.144044  1.454274 -1.420018
3  0.761038  0.121675  0.443863  0.333674 -1.706270
7  1.532779  1.469359  0.154947  0.378163  1.950775
9  1.230291  1.202380 -0.387327 -0.302303 -0.509652

The description of this new feature when it was first introduced can be found here .可在此处找到首次引入此新功能时的说明。

Super simple column assignment超级简单的列分配

A pandas dataframe is implemented as an ordered dict of columns. pandas 数据框被实现为列的有序字典。

This means that the __getitem__ [] can not only be used to get a certain column, but __setitem__ [] = can be used to assign a new column.这意味着__getitem__ []不仅可以用来获取某个列,而且__setitem__ [] =可以用来分配一个新列。

For example, this dataframe can have a column added to it by simply using the [] accessor例如,这个数据框可以通过简单地使用[]访问器来添加一个列

    size      name color
0    big      rose   red
1  small    violet  blue
2  small     tulip   red
3  small  harebell  blue

df['protected'] = ['no', 'no', 'no', 'yes']

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

Note that this works even if the index of the dataframe is off.请注意,即使数据帧的索引关闭,这也有效。

df.index = [3,2,1,0]
df['protected'] = ['no', 'no', 'no', 'yes']
    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

[]= is the way to go, but watch out! []= 是要走的路,但要小心!

However, if you have a pd.Series and try to assign it to a dataframe where the indexes are off, you will run in to trouble.但是,如果您有一个pd.Series并尝试将其分配给索引关闭的数据框,您将遇到麻烦。 See example:参见示例:

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'])
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

This is because a pd.Series by default has an index enumerated from 0 to n.这是因为默认情况下pd.Series具有从 0 到 n 枚举的索引。 And the pandas [] = method tries to be "smart" pandas [] =方法试图变得“聪明”

What actually is going on.究竟发生了什么。

When you use the [] = method pandas is quietly performing an outer join or outer merge using the index of the left hand dataframe and the index of the right hand series.当您使用[] =方法时,pandas 正在使用左侧数据帧的索引和右侧系列的索引悄悄地执行外部连接或外部合并。 df['column'] = series

Side note边注

This quickly causes cognitive dissonance, since the []= method is trying to do a lot of different things depending on the input, and the outcome cannot be predicted unless you just know how pandas works.这很快就会导致认知失调,因为[]=方法试图根据输入做很多不同的事情,除非你只知道pandas 是如何工作的,否则无法预测结果。 I would therefore advice against the []= in code bases, but when exploring data in a notebook, it is fine.因此,我建议不要在代码库中使用[]= ,但是在笔记本中探索数据时,这很好。

Going around the problem绕过问题

If you have a pd.Series and want it assigned from top to bottom, or if you are coding productive code and you are not sure of the index order, it is worth it to safeguard for this kind of issue.如果您有一个pd.Series并希望它从上到下分配,或者如果您正在编写生产代码并且您不确定索引顺序,那么为此类问题进行保护是值得的。

You could downcast the pd.Series to a np.ndarray or a list , this will do the trick.您可以将pd.Series向下转换为np.ndarraylist ,这样就可以了。

df['protected'] = pd.Series(['no', 'no', 'no', 'yes']).values

or或者

df['protected'] = list(pd.Series(['no', 'no', 'no', 'yes']))

But this is not very explicit.但这不是很明确。

Some coder may come along and say "Hey, this looks redundant, I'll just optimize this away".一些编码员可能会说“嘿,这看起来多余,我会优化它”。

Explicit way显式方式

Setting the index of the pd.Series to be the index of the df is explicit.pd.Series的索引设置为df的索引是明确的。

df['protected'] = pd.Series(['no', 'no', 'no', 'yes'], index=df.index)

Or more realistically, you probably have a pd.Series already available.或者更现实地说,您可能已经有一个pd.Series可用。

protected_series = pd.Series(['no', 'no', 'no', 'yes'])
protected_series.index = df.index

3     no
2     no
1     no
0    yes

Can now be assigned现在可以分配

df['protected'] = protected_series

    size      name color protected
3    big      rose   red        no
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue       yes

Alternative way with df.reset_index() df.reset_index()的替代方法

Since the index dissonance is the problem, if you feel that the index of the dataframe should not dictate things, you can simply drop the index, this should be faster, but it is not very clean, since your function now probably does two things.由于索引不协调是问题所在,如果您觉得数据帧的索引不应该决定事情,您可以简单地删除索引,这应该更快,但它不是很干净,因为您的函数现在可能做两件事。

df.reset_index(drop=True)
protected_series.reset_index(drop=True)
df['protected'] = protected_series

    size      name color protected
0    big      rose   red        no
1  small    violet  blue        no
2  small     tulip   red        no
3  small  harebell  blue       yes

Note on df.assign关于df.assign的注意事项

While df.assign make it more explicit what you are doing, it actually has all the same problems as the above []=虽然df.assign更明确地说明了你在做什么,但它实际上存在与上述[]=相同的问题

df.assign(protected=pd.Series(['no', 'no', 'no', 'yes']))
    size      name color protected
3    big      rose   red       yes
2  small    violet  blue        no
1  small     tulip   red        no
0  small  harebell  blue        no

Just watch out with df.assign that your column is not called self .请注意df.assign您的列不称为self It will cause errors.它会导致错误。 This makes df.assign smelly , since there are these kind of artifacts in the function.这使得df.assign异味,因为函数中有这类伪影。

df.assign(self=pd.Series(['no', 'no', 'no', 'yes'])
TypeError: assign() got multiple values for keyword argument 'self'

You may say, "Well, I'll just not use self then".你可能会说,“好吧,那我就不用self了”。 But who knows how this function changes in the future to support new arguments.但是谁知道这个函数将来会如何改变以支持新的论点。 Maybe your column name will be an argument in a new update of pandas, causing problems with upgrading.也许您的列名将成为熊猫新更新中的参数,从而导致升级问题。

It seems that in recent Pandas versions the way to go is to use df.assign :似乎在最近的 Pandas 版本中,要走的路是使用df.assign

df1 = df1.assign(e=np.random.randn(sLength))

It doesn't produce SettingWithCopyWarning .它不会产生SettingWithCopyWarning

Doing this directly via NumPy will be the most efficient:直接通过NumPy执行此操作将是最有效的:

df1['e'] = np.random.randn(sLength)

Note my original (very old) suggestion was to use map (which is much slower):请注意,我最初的(非常旧的)建议是使用map (速度要慢得多):

df1['e'] = df1['a'].map(lambda x: np.random.random())

Easiest ways:-最简单的方法:-

data['new_col'] = list_of_values

data.loc[ : , 'new_col'] = list_of_values

This way you avoid what is called chained indexing when setting new values in a pandas object.这样,您可以在 pandas 对象中设置新值时避免所谓的链式索引。 Click here to read further . 点击这里进一步阅读

If you want to set the whole new column to an initial base value (eg None ), you can do this: df1['e'] = None如果要将整个新列设置为初始基值(例如None ),可以这样做: df1['e'] = None

This actually would assign "object" type to the cell.这实际上会将“对象”类型分配给单元格。 So later you're free to put complex data types, like list, into individual cells.因此,稍后您可以自由地将复杂的数据类型(如列表)放入单个单元格中。

I got the dreaded SettingWithCopyWarning , and it wasn't fixed by using the iloc syntax.我得到了可怕的SettingWithCopyWarning ,并没有通过使用 iloc 语法来解决。 My DataFrame was created by read_sql from an ODBC source.我的 DataFrame 是由 read_sql 从 ODBC 源创建的。 Using a suggestion by lowtech above, the following worked for me:使用上面lowtech的建议,以下内容对我有用:

df.insert(len(df.columns), 'e', pd.Series(np.random.randn(sLength),  index=df.index))

This worked fine to insert the column at the end.这可以很好地在最后插入列。 I don't know if it is the most efficient, but I don't like warning messages.我不知道它是否是最有效的,但我不喜欢警告信息。 I think there is a better solution, but I can't find it, and I think it depends on some aspect of the index.我认为有更好的解决方案,但我找不到,我认为这取决于索引的某些方面。
Note .注意 That this only works once and will give an error message if trying to overwrite and existing column.这只能工作一次,如果尝试覆盖现有列,则会给出错误消息。
Note As above and from 0.16.0 assign is the best solution.注意如上所述,从 0.16.0 开始分配是最好的解决方案。 See documentation http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign Works well for data flow type where you don't overwrite your intermediate values.请参阅文档http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html#pandas.DataFrame.assign适用于不覆盖中间值的数据流类型。

  1. First create a python's list_of_e that has relevant data.首先创建一个具有相关数据的python list_of_e
  2. Use this: df['e'] = list_of_e使用这个: df['e'] = list_of_e

创建一个空列

df['i'] = None

If the column you are trying to add is a series variable then just :如果您尝试添加的列是系列变量,则只需:

df["new_columns_name"]=series_variable_name #this will do it for you

This works well even if you are replacing an existing column.just type the new_columns_name same as the column you want to replace.It will just overwrite the existing column data with the new series data.即使您要替换现有列,这也很有效。只需键入与要替换的列相同的 new_columns_name。它只会用新的系列数据覆盖现有列数据。

If the data frame and Series object have the same index , pandas.concat also works here:如果数据框和 Series 对象具有相同的 indexpandas.concat也可以在这里工作:

import pandas as pd
df
#          a            b           c           d
#0  0.671399     0.101208   -0.181532    0.241273
#1  0.446172    -0.243316    0.051767    1.577318
#2  0.614758     0.075793   -0.451460   -0.012493

e = pd.Series([-0.335485, -1.166658, -0.385571])    
e
#0   -0.335485
#1   -1.166658
#2   -0.385571
#dtype: float64

# here we need to give the series object a name which converts to the new  column name 
# in the result
df = pd.concat([df, e.rename("e")], axis=1)
df

#          a            b           c           d           e
#0  0.671399     0.101208   -0.181532    0.241273   -0.335485
#1  0.446172    -0.243316    0.051767    1.577318   -1.166658
#2  0.614758     0.075793   -0.451460   -0.012493   -0.385571

In case they don't have the same index:如果它们没有相同的索引:

e.index = df.index
df = pd.concat([df, e.rename("e")], axis=1)

Foolproof:万无一失:

df.loc[:, 'NewCol'] = 'New_Val'

Example:例子:

df = pd.DataFrame(data=np.random.randn(20, 4), columns=['A', 'B', 'C', 'D'])

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
3  -0.147354  0.778707  0.479145  2.284143
4  -0.529529  0.000571  0.913779  1.395894
5   2.592400  0.637253  1.441096 -0.631468
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
8   0.606985 -2.232903 -1.358107 -2.855494
9  -0.692013  0.671866  1.179466 -1.180351
10 -1.093707 -0.530600  0.182926 -1.296494
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
18  0.693458  0.144327  0.329500 -0.655045
19  0.104425  0.037412  0.450598 -0.923387


df.drop([3, 5, 8, 10, 18], inplace=True)

df

           A         B         C         D
0  -0.761269  0.477348  1.170614  0.752714
1   1.217250 -0.930860 -0.769324 -0.408642
2  -0.619679 -1.227659 -0.259135  1.700294
4  -0.529529  0.000571  0.913779  1.395894
6   0.757178  0.240012 -0.553820  1.177202
7  -0.986128 -1.313843  0.788589 -0.707836
9  -0.692013  0.671866  1.179466 -1.180351
11 -0.143273 -0.503199 -1.328728  0.610552
12 -0.923110 -1.365890 -1.366202 -1.185999
13 -2.026832  0.273593 -0.440426 -0.627423
14 -0.054503 -0.788866 -0.228088 -0.404783
15  0.955298 -1.430019  1.434071 -0.088215
16 -0.227946  0.047462  0.373573 -0.111675
17  1.627912  0.043611  1.743403 -0.012714
19  0.104425  0.037412  0.450598 -0.923387

df.loc[:, 'NewCol'] = 0

df
           A         B         C         D  NewCol
0  -0.761269  0.477348  1.170614  0.752714       0
1   1.217250 -0.930860 -0.769324 -0.408642       0
2  -0.619679 -1.227659 -0.259135  1.700294       0
4  -0.529529  0.000571  0.913779  1.395894       0
6   0.757178  0.240012 -0.553820  1.177202       0
7  -0.986128 -1.313843  0.788589 -0.707836       0
9  -0.692013  0.671866  1.179466 -1.180351       0
11 -0.143273 -0.503199 -1.328728  0.610552       0
12 -0.923110 -1.365890 -1.366202 -1.185999       0
13 -2.026832  0.273593 -0.440426 -0.627423       0
14 -0.054503 -0.788866 -0.228088 -0.404783       0
15  0.955298 -1.430019  1.434071 -0.088215       0
16 -0.227946  0.047462  0.373573 -0.111675       0
17  1.627912  0.043611  1.743403 -0.012714       0
19  0.104425  0.037412  0.450598 -0.923387       0

One thing to note, though, is that if you do不过要注意的一件事是,如果你这样做

df1['e'] = Series(np.random.randn(sLength), index=df1.index)

this will effectively be a left join on the df1.index.这实际上是 df1.index 上的连接。 So if you want to have an outer join effect, my probably imperfect solution is to create a dataframe with index values covering the universe of your data, and then use the code above.所以如果你想有一个连接效果,我可能不完美的解决方案是创建一个索引值覆盖你的数据域的数据框,然后使用上面的代码。 For example,例如,

data = pd.DataFrame(index=all_possible_values)
df1['e'] = Series(np.random.randn(sLength), index=df1.index)

to insert a new column at a given location (0 <= loc <= amount of columns) in a data frame, just use Dataframe.insert:要在数据框中的给定位置(0 <= loc <= 列数)插入新列,只需使用 Dataframe.insert:

DataFrame.insert(loc, column, value)

Therefore, if you want to add the column e at the end of a data frame called df , you can use:因此,如果您想在名为df的数据框的末尾添加列e ,您可以使用:

e = [-0.335485, -1.166658, -0.385571]    
DataFrame.insert(loc=len(df.columns), column='e', value=e)

value can be a Series, an integer (in which case all cells get filled with this one value), or an array-like structure value可以是一个系列、一个整数(在这种情况下,所有单元格都被这个值填充)或类似数组的结构

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html

Let me just add that, just like for hum3 , .loc didn't solve the SettingWithCopyWarning and I had to resort to df.insert() .让我补充一下,就像hum3一样, .loc没有解决SettingWithCopyWarning ,我不得不求助于df.insert() In my case false positive was generated by "fake" chain indexing dict['a']['e'] , where 'e' is the new column, and dict['a'] is a DataFrame coming from dictionary.在我的情况下,误报是由“假”链索引dict['a']['e']生成的,其中'e'是新列,而dict['a']是来自字典的 DataFrame。

Also note that if you know what you are doing, you can switch of the warning using pd.options.mode.chained_assignment = None and than use one of the other solutions given here.另请注意,如果您知道自己在做什么,则可以使用pd.options.mode.chained_assignment = None切换警告,而不是使用此处给出的其他解决方案之一。

Before assigning a new column, if you have indexed data, you need to sort the index.在分配新列之前,如果您有索引数据,则需要对索引进行排序。 At least in my case I had to:至少在我的情况下,我必须:

data.set_index(['index_column'], inplace=True)
"if index is unsorted, assignment of a new column will fail"        
data.sort_index(inplace = True)
data.loc['index_value1', 'column_y'] = np.random.randn(data.loc['index_value1', 'column_x'].shape[0])

向现有数据框添加新列“e”

 df1.loc[:,'e'] = Series(np.random.randn(sLength))

I was looking for a general way of adding a column of numpy.nan s to a dataframe without getting the dumb SettingWithCopyWarning .我正在寻找一种将numpy.nan的列添加到数据框中的一般方法,而不会得到愚蠢的SettingWithCopyWarning

From the following:从以下:

  • the answers here这里的答案
  • this question about passing a variable as a keyword argument这个关于将变量作为关键字参数传递的问题
  • this method for generating a numpy array of NaNs in-line这种用于在线生成 NaN 的numpy数组的方法

I came up with this:我想出了这个:

col = 'column_name'
df = df.assign(**{col:numpy.full(len(df), numpy.nan)})

For the sake of completeness - yet another solution using DataFrame.eval() method:为了完整起见-使用DataFrame.eval()方法的另一种解决方案:

Data:数据:

In [44]: e
Out[44]:
0    1.225506
1   -1.033944
2   -0.498953
3   -0.373332
4    0.615030
5   -0.622436
dtype: float64

In [45]: df1
Out[45]:
          a         b         c         d
0 -0.634222 -0.103264  0.745069  0.801288
4  0.782387 -0.090279  0.757662 -0.602408
5 -0.117456  2.124496  1.057301  0.765466
7  0.767532  0.104304 -0.586850  1.051297
8 -0.103272  0.958334  1.163092  1.182315
9 -0.616254  0.296678 -0.112027  0.679112

Solution:解决方案:

In [46]: df1.eval("e = @e.values", inplace=True)

In [47]: df1
Out[47]:
          a         b         c         d         e
0 -0.634222 -0.103264  0.745069  0.801288  1.225506
4  0.782387 -0.090279  0.757662 -0.602408 -1.033944
5 -0.117456  2.124496  1.057301  0.765466 -0.498953
7  0.767532  0.104304 -0.586850  1.051297 -0.373332
8 -0.103272  0.958334  1.163092  1.182315  0.615030
9 -0.616254  0.296678 -0.112027  0.679112 -0.622436

如果您只需要创建一个新的空列,那么最短的解决方案是:

df.loc[:, 'e'] = pd.Series()

The following is what I did... But I'm pretty new to pandas and really Python in general, so no promises.以下是我所做的......但我对熊猫和真正的Python很陌生,所以没有承诺。

df = pd.DataFrame([[1, 2], [3, 4], [5,6]], columns=list('AB'))

newCol = [3,5,7]
newName = 'C'

values = np.insert(df.values,df.shape[1],newCol,axis=1)
header = df.columns.values.tolist()
header.append(newName)

df = pd.DataFrame(values,columns=header)

If we want to assign a scaler value eg: 10 to all rows of a new column in a df:如果我们想为 df 中新列的所有行分配一个缩放器值,例如:10:

df = df.assign(new_col=lambda x:10)  # x is each row passed in to the lambda func

df will now have new column 'new_col' with value=10 in all rows. df 现在将在所有行中具有 value=10 的新列“new_col”。

If you get the SettingWithCopyWarning , an easy fix is to copy the DataFrame you are trying to add a column to.如果您得到SettingWithCopyWarning ,一个简单的解决方法是复制您尝试添加列的 DataFrame。

df = df.copy()
df['col_name'] = values
x=pd.DataFrame([1,2,3,4,5])

y=pd.DataFrame([5,4,3,2,1])

z=pd.concat([x,y],axis=1)

在此处输入图像描述

this is a special case of adding a new column to a pandas dataframe.这是向 pandas 数据框添加新列的特殊情况。 Here, I am adding a new feature/column based on an existing column data of the dataframe.在这里,我基于数据框的现有列数据添加了一个新功能/列。

so, let our dataFrame has columns 'feature_1', 'feature_2', 'probability_score' and we have to add a new_column 'predicted_class' based on data in column 'probability_score'.因此,让我们的 dataFrame 包含列“feature_1”、“feature_2”、“probability_score”,我们必须根据“probability_score”列中的数据添加一个新的“predicted_class”列。

I will use map() function from python and also define a function of my own which will implement the logic on how to give a particular class_label to every row in my dataFrame.我将使用 python 中的 map() 函数,并定义一个我自己的函数,该函数将实现有关如何为我的 dataFrame 中的每一行赋予特定 class_label 的逻辑。

data = pd.read_csv('data.csv')

def myFunction(x):
   //implement your logic here

   if so and so:
        return a
   return b

variable_1 = data['probability_score']
predicted_class = variable_1.map(myFunction)

data['predicted_class'] = predicted_class

// check dataFrame, new column is included based on an existing column data for each row
data.head()
import pandas as pd

# Define a dictionary containing data
data = {'a': [0,0,0.671399,0.446172,0,0.614758],
    'b': [0,0,0.101208,-0.243316,0,0.075793],
    'c': [0,0,-0.181532,0.051767,0,-0.451460],
    'd': [0,0,0.241273,1.577318,0,-0.012493]}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)

# Declare a list that is to be converted into a column
col_e = [-0.335485,-1.166658,-0.385571,0,0,0]


df['e'] = col_e

# add column 'e'
df['e'] = col_e

# Observe the result
df

编码

Whenever you add a Series object as new column to an existing DF, you need to make sure that they both have the same index.每当您将 Series 对象作为新列添加到现有 DF 时,您需要确保它们都具有相同的索引。 Then add it to the DF然后将其添加到 DF

e_series = pd.Series([-0.335485, -1.166658,-0.385571])
print(e_series)
e_series.index = d_f.index
d_f['e'] = e_series
d_f

在此处输入图像描述

you can insert new column by for loop like this:您可以通过for 循环插入新列,如下所示:

for label,row in your_dframe.iterrows():
      your_dframe.loc[label,"new_column_length"]=len(row["any_of_column_in_your_dframe"])

sample code here :示例代码在这里:

import pandas as pd

data = {
  "any_of_column_in_your_dframe" : ["ersingulbahar","yagiz","TS"],
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

#load data into a DataFrame object:
your_dframe = pd.DataFrame(data)


for label,row in your_dframe.iterrows():
      your_dframe.loc[label,"new_column_length"]=len(row["any_of_column_in_your_dframe"])
      
      
print(your_dframe) 

and output is here:输出在这里:

any_of_column_in_your_dframe any_of_column_in_your_dframe calories卡路里 duration期间 new_column_length新列长度
ersingulbahar厄辛古尔巴哈尔 420 420 50 50 13.0 13.0
yagiz亚吉兹 380 380 40 40 5.0 5.0
TS TS 390 390 45 45 2.0 2.0

Not: you can use like this as well:不是:你也可以这样使用:

your_dframe["new_column_length"]=your_dframe["any_of_column_in_your_dframe"].apply(len)

Simple way to add new columns to the existing dataframe is:向现有数据框添加新列的简单方法是:

new_cols = ['a' , 'b' , 'c' , 'd']

for col in new_cols:
    df[f'{col}'] = 0 #assiging 0 for the placeholder

print(df.columns)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM