[英]Assign value to a column based of other columns from the same pandas dataframe
with a dataframe, I have one column, called TM52_fail
有了数据
TM52_fail
,我有一个名为TM52_fail
列
2
1
-
1 & 2
1 & 2 & 3
-
-
3
etc.
and I would like to create an additional column, called TM52_fail_norm
, whose content depends on the content of the column TM52_fail
. 我想创建一个名为
TM52_fail_norm
的附加列,其内容取决于TM52_fail
列的内容。 My attempt (which includes the conditional filling): 我的尝试(包括条件填充):
def str_to_number(x):
if x=="1" or x=="2" or x=="3":
return 1
elif x=="1 & 2" or x=="2 & 3" or x=="1 & 3":
return 2
elif x=="1 & 2 & 3":
return 3
else:
return 0
df['TM52_fail_norm'] = ""
df['TM52_fail_norm'].apply(lambda x: str_to_number(x for x in df['TM52_fail']))
returns an empty column (I presume as a result of df['TM52_fail_norm'] = ""
). 返回一个空列(我假设是因为
df['TM52_fail_norm'] = ""
)。
I think you need cast to string by astype
and then apply function str_to_number
: 我认为你需要通过
astype
为字符串,然后应用函数str_to_number
:
df['new'] = df['TM52_fail_norm'].astype(str).apply(str_to_number)
print (df)
TM52_fail_norm new
0 2 1
1 1 1
2 - 0
3 1 & 2 2
4 1 & 2 & 3 3
5 - 0
6 - 0
7 3 1
Another solution with map
by dict
, last need fillna
by 0
and cast to int
: 使用
dict
map
的另一个解决方案,最后需要fillna
0
并转换为int
:
d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}
df['new'] = df['TM52_fail_norm'].map(d)
df['new'] = df['new'].fillna(0).astype(int)
print (df)
TM52_fail_norm new
0 2 1
1 1 1
2 - 0
3 1 & 2 2
4 1 & 2 & 3 3
5 - 0
6 - 0
7 3 1
Timings : 时间 :
#[800000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [315]: %timeit (jez1(df))
10 loops, best of 3: 63 ms per loop
In [316]: %timeit (df['TM52_fail_norm'].astype(str).apply(str_to_number))
1 loop, best of 3: 518 ms per loop
#http://stackoverflow.com/a/40176883/2901002
In [345]: %timeit (df.TM52_fail_norm.str.count('\d+'))
1 loop, best of 3: 707 ms per loop
def jez1(df):
d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}
df['new'] = df['TM52_fail_norm'].map(d)
df['new'] = df['new'].fillna(0).astype(int)
return (df)
print (jez1(df))
TL;DR: df.TM52_fail.str.count('\\d+')
TL; DR:
df.TM52_fail.str.count('\\d+')
It seems that what you really want is to count the number of digits. 看来你真正想要的是计算位数。 Here, pandas'
.str
accessor methods ( docs , summary of .str
methods ) are really helpful! 在这里,pandas的
.str
访问器方法( docs , .str
方法的摘要 )真的很有帮助!
I suppose TM52_fail
is of dtype str
; 我想
TM52_fail
是TM52_fail
str
; otherwise you can cast it with .astype(str)
, as suggested by @jezrael: 否则你可以使用
.astype(str)
来施放它, .astype(str)
建议的那样:
# setup
import pandas as pd
df = pd.DataFrame({'TM52_fail':[
"2", "1", "", "1 & 2", "1 & 2 & 3", "", "", "3"]})
# Use regex \d+ to find 1 or more consecutive digits
df['TM52_fail_norm2'] = df.TM52_fail.str.count('\d+')
Regex: 155 µs per loop
jez1: 999 µs per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.