[英]Pandas extracting values from dataframe based on condition
I'm trying to extract part of string before a dash in some rows in pandas df dataframe.我试图在 pandas df dataframe 的某些行中的破折号之前提取部分字符串。 The problem is that when I use extract() function it extracts the part of string before dash but inserts NaN value in rows where there is no dash present.
问题是,当我使用 extract() function 时,它会在破折号之前提取字符串的一部分,但在没有破折号的行中插入 NaN 值。
Data example:数据示例:
I2311-A45
Z13A-SA87
CSSSAA1-4
LKJ3B-15
1AAAZ0-14
ASHENSKFR
ASD
AFSDFGRE
So I have df['values'] where is the example column.所以我有 df['values'] 示例列在哪里。 My attempts are:
我的尝试是:
df['values'] = df['values'].str.extract('(.*)-')
output: output:
I2311
Z13A
CSSSAA1
LKJ3B
1AAAZ0
NaN
NaN
NaN
and it gives me 3 NaN values instead of它给了我 3 NaN 值而不是
ASHENSKFR
ASD
AFSDFGRE
Next what I was trying was using df.loc conditions and apply() function with lambda but with the same exception:接下来我尝试使用 df.loc 条件和 apply() function 和 lambda 但有同样的例外:
The truth value of a Series is ambiguous.
Series 的真值是模棱两可的。 Use a.empty, a.bool(), a.item(), a.any() or a.all().
使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
df['values'] = df['values'].apply(lambda x: df['values'].str.extract('(.*)-') if df['values'].str.contains('-') else None)
Thank you for help in advance!提前感谢您的帮助!
You can simply use Series.str.split
.您可以简单地使用
Series.str.split
。 This will split the value where -
is present, otherwise will leave the value as is.这将拆分存在
-
的值,否则将保持原样。
In [134]: df['values'].str.split('-').str[0]
Out[134]:
0 I2311
1 Z13A
2 CSSSAA1
3 LKJ3B
4 1AAAZ0
5 ASHENSKFR
6 ASD
7 AFSDFGRE
Name: values, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.