如何在Python中以整数开头过滤数据框中的列？

Question

我使用了以下代码：

data_snp_old=data_snp_age[data_snp_age['Age'].str.contains('15+', na = False)] 
data_snp_old=data_snp_age.filter(regex='^15+', axis=0)

代码工作不完美，即它们正在过滤，但一些<15 个条目的行也即将到来。

Answer 1

这里的问题在于您在contains()函数中使用的表达式。 不是将 '15+' 视为字符序列，而是将其视为正则表达式。 因此它同时符合这两个条件。

函数定义： Series.str.contains(pat, case=True, flags=0, na=nan, regex=True)

Parameter :
pat : Character sequence or regular expression.
case : If True, case sensitive.
flags : Flags to pass through to the re module, e.g. re.IGNORECASE.
na : Fill value for missing values.
regex : If True, assumes the pat is a regular expression.

Returns : Series or Index of boolean values

以下是您可以执行的操作：

import pandas as pd
# Making a toy data-set.
data={'Category':['Age','Age','Age','Age','Age'],'Age':['15+','<15','15+','<15','15+']}
df= pd.DataFrame(data=data)
print(df)
# Output: 
  Category  Age
0      Age  15+
1      Age  <15
2      Age  15+
3      Age  <15
4      Age  15+

这是重要的部分：

df_new=df[df['Age'].str.contains('15+', na = False,regex=False)]
# Tell contains() to not consider the expression as a regex by default.
print(df_new)
# Output:
  Category  Age
0      Age  15+
2      Age  15+
4      Age  15+

或者

df_new=df[df['Age'].str.contains(r'(\d{2}\+)', na = False)]
# the above regex matches a group in which two digits should be followed by a +
print(df_new)
# Output:
  Category  Age
0      Age  15+
2      Age  15+
4      Age  15+

这里有一些东西可以阅读以供进一步参考：

熊猫系列.str.contains()

希望这有帮助，干杯！

如何在Python中以整数开头过滤数据框中的列？

问题描述

1 个解决方案

解决方案1
0 2020-01-26 07:26:51

如何在Python中以整数开头过滤数据框中的列？

问题描述

1 个解决方案

解决方案1 0 2020-01-26 07:26:51

解决方案1
0 2020-01-26 07:26:51