在 Pandas 中添加具有特定 dtype 的新列

Question

Can we assign a new column to pandas and also declare the datatype in one fell scoop?我们可以为熊猫分配一个新列，并一次性声明数据类型吗？

df = pd.DataFrame({'BP': ['100/80'],'Sex': ['M']})
df2 = (df.drop('BP',axis=1)
       .assign(BPS =  lambda x: df.BP.str.extract('(?P<BPS>\d+)/'))
       .assign(BPD =  lambda x: df.BP.str.extract('/(?P<BPD>\d+)'))
        )

print(df2)
df2.dtypes

Can we have dtype as np.float using only the chained expression?我们可以仅使用链式表达式将 dtype 设为 np.float 吗？

Answer 1

Adding astype when you assign the values在assign值时添加astype

df2 = (df.drop('BP',axis=1)
       .assign(BPS =  lambda x: df.BP.str.extract('(?P<BPS>\d+)/').astype(float))
       .assign(BPD =  lambda x: df.BP.str.extract('/(?P<BPD>\d+)').astype(float))
       )
df2.dtypes
Sex     object
BPS    float64
BPD    float64
dtype: object

What I will do我将要做的

df.assign(**df.pop('BP').str.extract(r'(?P<BPS>\d+)/(?P<BPD>\d+)').astype(float))
  Sex    BPS   BPD
0   M  100.0  80.0

Answer 2

Obviously, you don't have to do this, but you can.很显然，你没有做到这一点，但你可以。

df.drop('BP', 1).join(
    df['BP'].str.split('/', expand=True)
            .set_axis(['BPS', 'BPD'], axis=1, inplace=False)
            .astype(float))

  Sex    BPS   BPD
0   M  100.0  80.0

Your two str.extract calls can be done away with in favour of a single str.split call.您可以取消两次str.extract调用，转而使用单个str.split调用。 You can then make one astype call.然后，您可以进行一次astype调用。

Personally, if you ask me about style, I would say this looks more elegant:就个人而言，如果你问我风格，我会说这看起来更优雅：

u = (df['BP'].str.split('/', expand=True)
             .set_axis(['BPS', 'BPD'], axis=1, inplace=False)
             .astype(float))
df.drop('BP', 1).join(u)


  Sex    BPS   BPD
0   M  100.0  80.0

Answer 3

use df.insert :使用df.insert ：

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
print('df to start with:', df, '\ndtypes:', df.dtypes, sep='\n')
print('\n')

df.insert(
    len(df.columns), 'new col 1', pd.Series([[1, 2, 3], 'a'], dtype=object))
df.insert(
    len(df.columns), 'new col 2', pd.Series([1, 2, 3]))
df.insert(
    len(df.columns), 'new col 3', pd.Series([1., 2, 3]))
print('df with columns added:', df, '\ndtypes:', df.dtypes, sep='\n')

output输出

df to start with:
   a  b
0  1  2
1  3  4

dtypes:
a    int64
b    int64
dtype: object


df with columns added:
   a  b  new col 1  new col 2  new col 3
0  1  2  [1, 2, 3]          1        1.0
1  3  4          a          2        2.0

dtypes:
a              int64
b              int64
new col 1     object
new col 2      int64
new col 3    float64
dtype: object

Answer 4

Just assign numpy arrays of the required type (inspired by a related question/answer ).只需分配所需类型的numpy数组（受相关问题/答案启发）。

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'a': np.array([1, 2, 3], dtype=int),
    'b': np.array([4, 5, 6], dtype=float),
    })
print('df to start with:', df, '\ndtypes:', df.dtypes, sep='\n')
print('\n')

df['new col 1'] = np.array([[1, 2, 3], 'a', np.nan], dtype=object)
df['new col 2'] = np.array([1, 2, 3], dtype=int)
df['new col 3'] = np.array([1, 2, 3], dtype=float)
print('df with columns added:', df, '\ndtypes:', df.dtypes, sep='\n')

output输出

df to start with:
   a    b
0  1  4.0
1  2  5.0
2  3  6.0

dtypes:
a      int64
b    float64
dtype: object


df with columns added:
   a    b  new col 1  new col 2  new col 3
0  1  4.0  [1, 2, 3]          1        1.0
1  2  5.0          a          2        2.0
2  3  6.0        NaN          3        3.0

dtypes:
a              int64
b            float64
new col 1     object
new col 2      int64
new col 3    float64
dtype: object

在 Pandas 中添加具有特定 dtype 的新列

问题描述

4 个解决方案

解决方案1
4 2019-01-23 03:10:02

解决方案2
4 已采纳 2019-01-23 03:11:41

解决方案3
1 2021-04-09 07:19:35

解决方案4
0 2021-04-09 08:14:46

在 Pandas 中添加具有特定 dtype 的新列

问题描述

4 个解决方案

解决方案1 4 2019-01-23 03:10:02

解决方案2 4 已采纳 2019-01-23 03:11:41

解决方案3 1 2021-04-09 07:19:35

解决方案4 0 2021-04-09 08:14:46

解决方案1
4 2019-01-23 03:10:02

解决方案2
4 已采纳 2019-01-23 03:11:41

解决方案3
1 2021-04-09 07:19:35

解决方案4
0 2021-04-09 08:14:46