根据同一pandas数据框中的其他列为列分配值

Question

with a dataframe, I have one column, called TM52_fail 有了数据TM52_fail ，我有一个名为TM52_fail列

2
1
-
1 & 2
1 & 2 & 3
-
-
3
etc.

and I would like to create an additional column, called TM52_fail_norm , whose content depends on the content of the column TM52_fail . 我想创建一个名为TM52_fail_norm的附加列，其内容取决于TM52_fail列的内容。 My attempt (which includes the conditional filling): 我的尝试（包括条件填充）：

def str_to_number(x):
    if x=="1" or x=="2" or x=="3":
        return 1
    elif x=="1 & 2" or x=="2 & 3" or x=="1 & 3":
        return 2
    elif x=="1 & 2 & 3":
        return 3
    else:
        return 0

df['TM52_fail_norm'] = ""
df['TM52_fail_norm'].apply(lambda x: str_to_number(x for x in df['TM52_fail']))

returns an empty column (I presume as a result of df['TM52_fail_norm'] = "" ). 返回一个空列（我假设是因为df['TM52_fail_norm'] = "" ）。

Answer 1

I think you need cast to string by astype and then apply function str_to_number : 我认为你需要通过astype为字符串，然后应用函数str_to_number ：

df['new'] = df['TM52_fail_norm'].astype(str).apply(str_to_number)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Another solution with map by dict , last need fillna by 0 and cast to int : 使用dict map的另一个解决方案，最后需要fillna 0并转换为int ：

d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

df['new'] = df['TM52_fail_norm'].map(d)
df['new'] = df['new'].fillna(0).astype(int)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Timings : 时间：

#[800000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [315]: %timeit (jez1(df))
10 loops, best of 3: 63 ms per loop

In [316]: %timeit (df['TM52_fail_norm'].astype(str).apply(str_to_number))
1 loop, best of 3: 518 ms per loop

#http://stackoverflow.com/a/40176883/2901002
In [345]: %timeit (df.TM52_fail_norm.str.count('\d+'))
1 loop, best of 3: 707 ms per loop


def jez1(df):
    d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

    df['new'] = df['TM52_fail_norm'].map(d)
    df['new'] = df['new'].fillna(0).astype(int)
    return (df)

print (jez1(df))

Answer 2

TL;DR: df.TM52_fail.str.count('\\d+') TL; DR： df.TM52_fail.str.count('\\d+')

It seems that what you really want is to count the number of digits. 看来你真正想要的是计算位数。 Here, pandas' .str accessor methods ( docs , summary of .str methods ) are really helpful! 在这里，pandas的.str访问器方法（ docs ， .str方法的摘要）真的很有帮助！

I suppose TM52_fail is of dtype str ; 我想TM52_fail是TM52_fail str ; otherwise you can cast it with .astype(str) , as suggested by @jezrael: 否则你可以使用.astype(str)来施放它， .astype(str)建议的那样：

# setup
import pandas as pd
df = pd.DataFrame({'TM52_fail':[
    "2", "1", "", "1 & 2", "1 & 2 & 3", "", "", "3"]})

# Use regex \d+ to find 1 or more consecutive digits
df['TM52_fail_norm2'] = df.TM52_fail.str.count('\d+')

Timings 计时

Regex: 155 µs per loop
 jez1: 999 µs per loop

根据同一pandas数据框中的其他列为列分配值

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-10-21 10:54:33

解决方案2
1 2016-10-21 12:37:28

Timings 计时

根据同一pandas数据框中的其他列为列分配值

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-10-21 10:54:33

解决方案2 1 2016-10-21 12:37:28

Timings 计时

解决方案1
2 已采纳 2016-10-21 10:54:33

解决方案2
1 2016-10-21 12:37:28