从 Pandas 中的 Dataframe 值中删除不需要的字符

Question

我有以下 Dataframe，其中包含来自多基因组 alignment 的基因座/基因名称。

但是，我试图只获取没有坐标的轨迹/名称的完整列表。

    Tuberculosis_locus  Smagmatis_locus             H37RA_locus             Bovis_locus
0   0:Rv0001:1-1524     1:MSMEG_RS33460:6986600-6988114 2:MRA_RS00005:1-1524    3:BQ2027_RS00005:1-1524
1   0:Rv0002:2052-3260  1:MSMEG_RS00005:499-1692    2:MRA_RS00010:2052-3260 3:BQ2027_RS00010:2052-3260
2   0:Rv0003:3280-4437  1:MSMEG_RS00015:2624-3778   2:MRA_RS00015:3280-4437 3:BQ2027_RS00015:3280-4437

为避免空单元格出现问题，我用“N/A”填充单元格，然后去除不需要的字符。 但它给出了完全相同的结果，似乎什么都没有发生。

for value in orthologs['Tuberculosis_locus']:
    orthologs['Tuberculosis_locus'] = orthologs['Tuberculosis_locus'].fillna("N/A")
    orthologs['Tuberculosis_locus'] = orthologs['Tuberculosis_locus'].map(lambda x: x.lstrip('\d:').rstrip(':\d+'))

知道我做错了什么吗？ 我想要以下 output：

Tuberculosis_locus  Smagmatis_locus  H37RA_locus  Bovis_locus
    0   Rv0001  MSMEG_RS33460   MRA_RS00005 BQ2027_RS00005
    1   Rv0002  MSMEG_RS00005   MRA_RS00010 BQ2027_RS00010
    2   Rv0003  MSMEG_RS00015   MRA_RS00015 BQ2027_RS00015

Answer 1

拆分为:最大拆分为两个，然后取第二个元素，例如：

df.applymap(lambda v: v.split(':', 2)[1])

Answer 2

def clean(x):
    x = x.split(':')[1].strip()
    return x

orthologs = orthologs.applymap(clean)

应该管用。

解释：

applymap是针对整个 dataframe， apply是针对一个数据列。

clean是您要应用于 dataframe 的每个条目的 function。请注意，当您将它与applymap或apply一起使用时，您不再需要(x) 。

从 Pandas 中的 Dataframe 值中删除不需要的字符

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-02-12 17:34:28

解决方案2
0 2022-02-12 17:46:23

从 Pandas 中的 Dataframe 值中删除不需要的字符

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-02-12 17:34:28

解决方案2 0 2022-02-12 17:46:23

解决方案1
1 已采纳 2022-02-12 17:34:28

解决方案2
0 2022-02-12 17:46:23