简体   繁体   English

Pandas 根据条件从 dataframe 中提取值

[英]Pandas extracting values from dataframe based on condition

I'm trying to extract part of string before a dash in some rows in pandas df dataframe.我试图在 pandas df dataframe 的某些行中的破折号之前提取部分字符串。 The problem is that when I use extract() function it extracts the part of string before dash but inserts NaN value in rows where there is no dash present.问题是,当我使用 extract() function 时,它会在破折号之前提取字符串的一部分,但在没有破折号的行中插入 NaN 值。

Data example:数据示例:

I2311-A45
Z13A-SA87 
CSSSAA1-4 
LKJ3B-15
1AAAZ0-14
ASHENSKFR
ASD
AFSDFGRE

So I have df['values'] where is the example column.所以我有 df['values'] 示例列在哪里。 My attempts are:我的尝试是:

df['values'] = df['values'].str.extract('(.*)-')

output: output:

I2311
Z13A 
CSSSAA1 
LKJ3B
1AAAZ0
NaN
NaN
NaN

and it gives me 3 NaN values instead of它给了我 3 NaN 值而不是

ASHENSKFR
ASD
AFSDFGRE

Next what I was trying was using df.loc conditions and apply() function with lambda but with the same exception:接下来我尝试使用 df.loc 条件和 apply() function 和 lambda 但有同样的例外:

The truth value of a Series is ambiguous. Series 的真值是模棱两可的。 Use a.empty, a.bool(), a.item(), a.any() or a.all().使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

df['values'] = df['values'].apply(lambda x: df['values'].str.extract('(.*)-') if df['values'].str.contains('-') else None)

Thank you for help in advance!提前感谢您的帮助!

You can simply use Series.str.split .您可以简单地使用Series.str.split This will split the value where - is present, otherwise will leave the value as is.这将拆分存在-的值,否则将保持原样。

In [134]: df['values'].str.split('-').str[0]
Out[134]: 
0        I2311
1         Z13A
2      CSSSAA1
3        LKJ3B
4       1AAAZ0
5    ASHENSKFR
6          ASD
7     AFSDFGRE
Name: values, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM