简体   繁体   English

根据同一pandas数据框中的其他列为列分配值

[英]Assign value to a column based of other columns from the same pandas dataframe

with a dataframe, I have one column, called TM52_fail 有了数据TM52_fail ,我有一个名为TM52_fail

2
1
-
1 & 2
1 & 2 & 3
-
-
3
etc.

and I would like to create an additional column, called TM52_fail_norm , whose content depends on the content of the column TM52_fail . 我想创建一个名为TM52_fail_norm的附加列,其内容取决于TM52_fail列的内容。 My attempt (which includes the conditional filling): 我的尝试(包括条件填充):

def str_to_number(x):
    if x=="1" or x=="2" or x=="3":
        return 1
    elif x=="1 & 2" or x=="2 & 3" or x=="1 & 3":
        return 2
    elif x=="1 & 2 & 3":
        return 3
    else:
        return 0

df['TM52_fail_norm'] = ""
df['TM52_fail_norm'].apply(lambda x: str_to_number(x for x in df['TM52_fail']))

returns an empty column (I presume as a result of df['TM52_fail_norm'] = "" ). 返回一个空列(我假设是因为df['TM52_fail_norm'] = "" )。

I think you need cast to string by astype and then apply function str_to_number : 我认为你需要通过astype为字符串,然后应用函数str_to_number

df['new'] = df['TM52_fail_norm'].astype(str).apply(str_to_number)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Another solution with map by dict , last need fillna by 0 and cast to int : 使用dict map的另一个解决方案,最后需要fillna 0并转换为int

d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

df['new'] = df['TM52_fail_norm'].map(d)
df['new'] = df['new'].fillna(0).astype(int)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Timings : 时间

#[800000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [315]: %timeit (jez1(df))
10 loops, best of 3: 63 ms per loop

In [316]: %timeit (df['TM52_fail_norm'].astype(str).apply(str_to_number))
1 loop, best of 3: 518 ms per loop

#http://stackoverflow.com/a/40176883/2901002
In [345]: %timeit (df.TM52_fail_norm.str.count('\d+'))
1 loop, best of 3: 707 ms per loop


def jez1(df):
    d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

    df['new'] = df['TM52_fail_norm'].map(d)
    df['new'] = df['new'].fillna(0).astype(int)
    return (df)

print (jez1(df))

TL;DR: df.TM52_fail.str.count('\\d+') TL; DR: df.TM52_fail.str.count('\\d+')

It seems that what you really want is to count the number of digits. 看来你真正想要的是计算位数。 Here, pandas' .str accessor methods ( docs , summary of .str methods ) are really helpful! 在这里,pandas的.str访问器方法( docs.str方法的摘要 )真的很有帮助!

I suppose TM52_fail is of dtype str ; 我想TM52_failTM52_fail str ; otherwise you can cast it with .astype(str) , as suggested by @jezrael: 否则你可以使用.astype(str)来施放它, .astype(str)建议的那样:

# setup
import pandas as pd
df = pd.DataFrame({'TM52_fail':[
    "2", "1", "", "1 & 2", "1 & 2 & 3", "", "", "3"]})

# Use regex \d+ to find 1 or more consecutive digits
df['TM52_fail_norm2'] = df.TM52_fail.str.count('\d+')

Timings 计时

Regex: 155 µs per loop
 jez1: 999 µs per loop

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在pandas数据框中,我想基于将其他列过滤为某些值来为该列分配值 - In a pandas dataframe I would like to assign a value to a column based on filtering other columns to certain values 从Pandas数据框中的其他列分配列的值 - Assign columns' value from other columns in Pandas dataframe Pandas dataframe select 列基于其他 Z6A8064B5DF479455500553 列中的值47DC - Pandas dataframe select Columns based on other dataframe contains column value in it 根据Pandas数据帧中其他列的值设置列的值 - Setting value of a column based on values of other columns in Pandas dataframe 根据条件和前一行值从其他列填充 Pandas Dataframe 列 - Populate Pandas Dataframe column from other columns based on a condition and previous row value 比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值 - comparing values of 2 columns from same pandas dataframe & returning value of 3rd column based on comparison 基于其他列向 pandas dataframe 添加列 - Adding a column to a pandas dataframe based on other columns 根据其他列中的信息从 Pandas Dataframe 中提取单个值 - Extracting a single value from a Pandas Dataframe based on info in other column 根据列值将数据从一个 Pandas 数据帧复制到另一个 - Copying data from one pandas dataframe to other based on column value pandas基于列中的相同值合并数据帧 - pandas merge dataframe based on same value in columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM