[英]Regular expression to replace string separated by comma with thier sum in pandas dataframe
I have a tab-separated data frame which looks like (for example):我有一个制表符分隔的数据框,看起来像(例如):
A B C
gene1 AHX21832.1 EEL39984.1,ARO60330.1 EEL39984.1
gene2 EEL39984.1,ARO60330.1 ARO60330.1 ARO60330.1
gene3 AYF09030.1,EEL37774.1,AQY42173.1 AQY42173.1 AQY42173.1
The following script work well on list:以下脚本在列表中运行良好:
values = ["AHX21832.1", "EEL39984.1,ARO60330.1", "AYF09030.1,EEL37774.1,AQY42173.1"]
How I can implement this script on my panda's data frame?如何在我的熊猫数据框上实现这个脚本? Since there is no re.findall in pandas.
由于 pandas 中没有 re.findall。
Take a look at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.findall.html .看看https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.findall.html 。 It looks like it is possible to do do the equivalent of
re.findall
on a dataframe.看起来可以在 dataframe 上执行相当于
re.findall
的操作。
for column, data in df.iteritems():
res = data.str.findall("[A-Z0-9]\.(\d+)")
So for the code you posted in your repl.it link, you could get the same results by doing:因此,对于您在 repl.it 链接中发布的代码,您可以通过执行以下操作获得相同的结果:
import pandas as pd
values = pd.Series(["AHX21832.1",
"EEL39984.1,ARO60330.1",
"AYF09030.1,EEL37774.1,AQY42173.1"])
res = values.str.findall("[A-Z0-9]\.(\d+)")
for x in res:
print("Found", x)
print("total", res.shape[0])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.