[英]Adding a new column with specific dtype in pandas
Can we assign a new column to pandas and also declare the datatype in one fell scoop?我们可以为熊猫分配一个新列,并一次性声明数据类型吗?
df = pd.DataFrame({'BP': ['100/80'],'Sex': ['M']})
df2 = (df.drop('BP',axis=1)
.assign(BPS = lambda x: df.BP.str.extract('(?P<BPS>\d+)/'))
.assign(BPD = lambda x: df.BP.str.extract('/(?P<BPD>\d+)'))
)
print(df2)
df2.dtypes
Can we have dtype as np.float using only the chained expression?我们可以仅使用链式表达式将 dtype 设为 np.float 吗?
Adding astype
when you assign
the values在
assign
值时添加astype
df2 = (df.drop('BP',axis=1)
.assign(BPS = lambda x: df.BP.str.extract('(?P<BPS>\d+)/').astype(float))
.assign(BPD = lambda x: df.BP.str.extract('/(?P<BPD>\d+)').astype(float))
)
df2.dtypes
Sex object
BPS float64
BPD float64
dtype: object
What I will do我将要做的
df.assign(**df.pop('BP').str.extract(r'(?P<BPS>\d+)/(?P<BPD>\d+)').astype(float))
Sex BPS BPD
0 M 100.0 80.0
Obviously, you don't have to do this, but you can.很显然,你没有做到这一点,但你可以。
df.drop('BP', 1).join(
df['BP'].str.split('/', expand=True)
.set_axis(['BPS', 'BPD'], axis=1, inplace=False)
.astype(float))
Sex BPS BPD
0 M 100.0 80.0
Your two str.extract
calls can be done away with in favour of a single str.split
call.您可以取消两次
str.extract
调用,转而使用单个str.split
调用。 You can then make one astype
call.然后,您可以进行一次
astype
调用。
Personally, if you ask me about style, I would say this looks more elegant:就个人而言,如果你问我风格,我会说这看起来更优雅:
u = (df['BP'].str.split('/', expand=True)
.set_axis(['BPS', 'BPD'], axis=1, inplace=False)
.astype(float))
df.drop('BP', 1).join(u)
Sex BPS BPD
0 M 100.0 80.0
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
print('df to start with:', df, '\ndtypes:', df.dtypes, sep='\n')
print('\n')
df.insert(
len(df.columns), 'new col 1', pd.Series([[1, 2, 3], 'a'], dtype=object))
df.insert(
len(df.columns), 'new col 2', pd.Series([1, 2, 3]))
df.insert(
len(df.columns), 'new col 3', pd.Series([1., 2, 3]))
print('df with columns added:', df, '\ndtypes:', df.dtypes, sep='\n')
output输出
df to start with:
a b
0 1 2
1 3 4
dtypes:
a int64
b int64
dtype: object
df with columns added:
a b new col 1 new col 2 new col 3
0 1 2 [1, 2, 3] 1 1.0
1 3 4 a 2 2.0
dtypes:
a int64
b int64
new col 1 object
new col 2 int64
new col 3 float64
dtype: object
Just assign numpy
arrays of the required type (inspired by a related question/answer ).只需分配所需类型的
numpy
数组(受相关问题/答案启发)。
import numpy as np
import pandas as pd
df = pd.DataFrame({
'a': np.array([1, 2, 3], dtype=int),
'b': np.array([4, 5, 6], dtype=float),
})
print('df to start with:', df, '\ndtypes:', df.dtypes, sep='\n')
print('\n')
df['new col 1'] = np.array([[1, 2, 3], 'a', np.nan], dtype=object)
df['new col 2'] = np.array([1, 2, 3], dtype=int)
df['new col 3'] = np.array([1, 2, 3], dtype=float)
print('df with columns added:', df, '\ndtypes:', df.dtypes, sep='\n')
output输出
df to start with:
a b
0 1 4.0
1 2 5.0
2 3 6.0
dtypes:
a int64
b float64
dtype: object
df with columns added:
a b new col 1 new col 2 new col 3
0 1 4.0 [1, 2, 3] 1 1.0
1 2 5.0 a 2 2.0
2 3 6.0 NaN 3 3.0
dtypes:
a int64
b float64
new col 1 object
new col 2 int64
new col 3 float64
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.