正则表达式用 pandas dataframe 中的总和替换用逗号分隔的字符串

Question

I have a tab-separated data frame which looks like (for example):我有一个制表符分隔的数据框，看起来像（例如）：

   A                                 B                      C
gene1  AHX21832.1                        EEL39984.1,ARO60330.1  EEL39984.1
gene2  EEL39984.1,ARO60330.1             ARO60330.1             ARO60330.1
gene3  AYF09030.1,EEL37774.1,AQY42173.1  AQY42173.1             AQY42173.1

The following script work well on list:以下脚本在列表中运行良好：

values = ["AHX21832.1", "EEL39984.1,ARO60330.1", "AYF09030.1,EEL37774.1,AQY42173.1"]

script脚本

How I can implement this script on my panda's data frame?如何在我的熊猫数据框上实现这个脚本？ Since there is no re.findall in pandas.由于 pandas 中没有 re.findall。

Answer 1

Take a look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.findall.html .看看https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.findall.html 。 It looks like it is possible to do do the equivalent of re.findall on a dataframe.看起来可以在 dataframe 上执行相当于re.findall的操作。

for column, data in df.iteritems():
    res = data.str.findall("[A-Z0-9]\.(\d+)")

So for the code you posted in your repl.it link, you could get the same results by doing:因此，对于您在 repl.it 链接中发布的代码，您可以通过执行以下操作获得相同的结果：

import pandas as pd

values = pd.Series(["AHX21832.1",
"EEL39984.1,ARO60330.1",
"AYF09030.1,EEL37774.1,AQY42173.1"])

res = values.str.findall("[A-Z0-9]\.(\d+)")

for x in res:
    print("Found", x)
print("total", res.shape[0])

正则表达式用 pandas dataframe 中的总和替换用逗号分隔的字符串

问题描述

1 个解决方案

解决方案1
0 2020-05-20 22:56:22

正则表达式用 pandas dataframe 中的总和替换用逗号分隔的字符串

问题描述

1 个解决方案

解决方案1 0 2020-05-20 22:56:22

解决方案1
0 2020-05-20 22:56:22