简体   繁体   English

如何在python中使用re.findall提取日期及其后的所有数据

[英]How to extract dates and all of the data following them using re.findall in python

I have a string, and I'm hoping to use regular expressions to separate some of the information from the string. 我有一个字符串,我希望使用正则表达式从字符串中分离出一些信息。

The following code that I have has two problems that I'm aware of: 1) it is failing to capture the date, "20120623" for a reason that I'm not quite sure I understand (perhaps the compiled regex query is not doing what I expect it to do?), and 2) the current regex expression I'm using to designate how the expression ends won't work for the last match in the string. 我遇到的以下代码有两个我知道的问题:1)由于我不确定我是否理解(也许已编译的正则表达式查询未完成),因此无法捕获日期“ 20120623”我期望它做什么?),以及2)我用于指定表达式结束方式的当前regex表达式不适用于字符串中的最后一个匹配项。

I may revise the title of this question, depending on what the problem/solution ends up being. 我可能会修改该问题的标题,具体取决于最终解决的问题/解决方案是什么。

The current code: 当前代码:

import re

s='20120622 10.3 -84.8$Sabulodes colombiata;Clemensia cincinnata;Gn sp_san_luis_1037;Glena mopsaria;Euphyia sp_group_san_luis;Thysanopyga carfinia;Glena mopsaria;Eusarca minucia;Euphyia sp_group_san_luis;Scopula compensata;Gn sp_san_luis_1028;Gn sp_san_luis_7003;Trygodes amphion;Phyllodonta latrata;Gn sp_san_luis_1003;Idaea sp_group_san_luis;Gn sp_san_luis_1002;Melanolophia sp_san_luis;Hemiceras pernubila;Leucula meganira;Oospila athena;Gn sp_san_luis_6001;Epimecis semicompleta;Gn sp_san_luis_1004;Glena mopsaria;Euphyia sp_group_san_luis;Iridopsis validaria;Eusarca asteria--cayennaria;Disphragis proba;Trygodes amphion;Gn sp_san_luis_8022;Melanolophia sp_san_luis;Melanolophia sp_san_luis;Melanolophia sp_san_luis;Idaea sp_group_san_luis;Gn sp_san_luis_1013;Gn sp_san_luis_7001;Disphragis proba;Gn sp_san_luis_1003;Eois sp_san_luis_b;Prochoerodes striata;Nola sp_san_luis_a;Oospila venezuelata;Eois sp_san_luis_b;Oospila venezuelata;Glena mopsaria;Idaea sp_group_san_luis;Oospila venezuelata;Gn sp_san_luis_1002;Phyllodonta latrata;Virbia sp_san_luis_a;Oospila venezuelata;Hemiceras pernubila;Pantherodes conglomerata;Nephodia betala;Melanolophia sp_san_luis;Herbita medama--medona;Euphyia sp_group_san_luis;Idaea sp_group_san_luis;Glena mopsaria;Pantherodes conglomerata;Glena mopsaria;Eois sp_san_luis_k;Melanolophia sp_san_luis;Idaea sp_group_san_luis;Eois sp_san_luis_b;Idaea sp_group_san_luis;Oospila venezuelata;Gn sp_san_luis_1010;Gn sp_san_luis_7001;Eois sp_san_luis_k;Hemiceras pernubila;Dyspteris tenuivitta;Macaria pernicata;Hemiceras pernubila;Hymenia perspectalis;Oxydia sp_san_luis_c;Cliniodes opalalis;Gn sp_san_luis_1042;Oospila athena;Eois sp_san_luis_b;Hemiceras rufescens;Glena mopsaria;Gn sp_san_luis_1011;Phyllodonta latrata;Simopteryx torquataria;Perasia helvina;Nola sp_san_luis_a;Gn sp_san_luis_2023;Nephodia auxesia;Nephodia betala;Glena mopsaria;Ametris nitocris;Gn sp_san_luis_1003;Glena mopsaria;Hymenia perspectalis;Phyllodonta latrata;Eusarca asteria--cayennaria;Disphragis proba;Gn sp_san_luis_2028;Ptychamalia sp_san_luis_a;Nola sp_san_luis_a;Anticla antica;Nola sp_san_luis_a;Semaeopus sabuloides;Simopteryx torquataria;Gn sp_san_luis_1002;Phyllodonta latrata;Glena mopsaria;Gizma undilinealis;Eois sp_san_luis_d;Oxydia bilinea;Hemiceras pernubila;Hemiceras pernubila;Thysanopyga amarantha;Plusiodonta sp_san_luis_a;Gn sp_san_luis_1011;Cyclophora sp_san_luis_a;Bleptina caradrinalis;Opharus rudis;Melanolophia sp_san_luis;Schrankia macula;Leucula meganira;Iridopsis lurida--oberthuri--herse;Schrankia macula;Letis buteo;Scopula sp_group_san_luis;Herbita amicaria;Gn sp_san_luis_1013;Adhemarius ypsilon;Adhemarius ypsilon;Adhemarius ypsilon;Euphyia sp_group_san_luis;Idaea tacturata;Adhemarius ypsilon;Hemerophila gradella;Acrosemia vulpecularia;Pyrgion repanda;Epimecis matronaria;Scopula sp_group_san_luis;Eois sp_san_luis_b;Virbia sp_san_luis_a;Conchylodes erinalis;Eois sp_san_luis_b;Epimecis matronaria;Melanolophia sp_san_luis;Melese monima;Rhabdatomis laudamia;Lirimiris inopinata;Josia sp_group_san_luis;Pseudodirphia menander;Josia sp_group_san_luis;Disphragis proba;Cliniodes opalalis;Melanolophia sp_san_luis;Prorifrons rufescens;Antiblemma concinnula;Melanolophia sp_san_luis;Nola sp_san_luis_a;Melanolophia sp_san_luis;Cliniodes opalalis;Eois sp_san_luis_b;Renia vinasalis;Eucereon relegata;Colla rhodope;Conchylodes erinalis;Oospila venezuelata;Semaeopus illimitata;Eucereon aurantiaca;Macaria approximaria--gambarina--ostia;Hemiceras modesta;Microphysetica hermeasalis;Nola sp_san_luis_a;Melanolophia sp_san_luis;Oospila venezuelata;Cautethia spuria;Euphyia sp_group_san_luis;Gn sp_san_luis_8046;Polypoetes villia;Synnomos sp_san_luis_a;Hemiceras pernubila;Hemerophila gradella;Nola sp_san_luis_a;Iridopsis validaria;Sarsina purpurascens;Argyrotome prospecta;Gn sp_san_luis_2008;Eulepidotis alabastraria;Gn sp_san_luis_7001;Desmia bajulalis;Eusarca asteria--cayennaria;Nola sp_san_luis_a;Oospila venezuelata;Macaria approximaria--gambarina--ostia;Epimecis matronaria;Glena mopsaria;Euphyia sp_group_san_luis;Iridopsis lurida--oberthuri--herse;Gizma undilinealis;Dichomeris arotrosema;Iridopsis lurida--oberthuri--herse;Disphragis proba;Gn sp_san_luis_7003;Eusarca asteria--cayennaria;Acharia hyperoche;Thysanopyga amarantha;Oospila athena;Glena mopsaria;Crambidia myrlosea;Marimatha nigrofimbria;Eucereon aurantiaca;Euphyia sp_group_san_luis;Gn sp_san_luis_1023;Pyrgion repanda;Microphysetica hermeasalis;Lobocleta tenellata;Scopula sp_group_san_luis;Clepsis sp_san_luis_a;Iridopsis lurida--oberthuri--herse;Argyrotome prospecta;Phrygionis polita;Marimatha nigrofimbria;Iridopsis lurida--oberthuri--herse;Semaeopus viridiplaga;Euphyia sp_group_san_luis;Iridopsis lurida--oberthuri--herse;Nephodia auxesia;Josia sp_group_san_luis;Herbita amicaria;Melanolophia sp_san_luis;Semaeopus illimitata;Thysanopyga amarantha;Opisthoxia miletia;Hymenia perspectalis;Gn sp_san_luis_4001;Gn sp_san_luis_8057;Oospila athena;Hymenia perspectalis 20120623 10.3 -84.8$Amorbia emigratella;Idaea sp_group_san_luis;Oospila athena;Lomographa argentata;Idaea sp_group_san_luis;Dysodia oculatana;Ptychamalia sp_san_luis_a;Gizma undilinealis;Udea rubigalis;Antaeotricha sp_san_luis_b;Bryoptera friaria;Euphyia sp_group_san_luis;Gn sp_san_luis_1028;Glena mopsaria;Epicrisias eschara;Gn sp_san_luis_1037;Hemiceras pernubila;Melanolophia sp_san_luis;Eois sp_san_luis_b;Lonomia electra;Gn sp_san_luis_1027;Macaria approximaria--gambarina--ostia;Gn sp_san_luis_1002;Gn sp_san_luis_1001;Eusarca asteria--cayennaria;Eucereon tigrata;Simopteryx torquataria;Phyllodonta latrata;Eois sp_san_luis_b;Thysanopyga amarantha;Oospila athena;Scopula sp_group_san_luis;Oospila venezuelata;Lineodes sp_san_luis_a;Gn sp_san_luis_1001;Glena mopsaria;Glena mopsaria;Amorbia emigratella;Oospila venezuelata;Oospila venezuelata;Oospila venezuelata;Nola sp_san_luis_a;Oospila venezuelata;Gn sp_san_luis_1002;Glena mopsaria;Gn sp_san_luis_1002;Anomis texana;Melanolophia sp_san_luis;Paragonia cruraria;Euphyia sp_group_san_luis;Gn sp_san_luis_1013;Spodoptera eridania;Eois sp_san_luis_f;Euphyia sp_group_san_luis;Herbita amicaria;Disphragis proba;Anomis texana;Hemiceras pernubila;Thysanopyga amarantha;Physocleora pauper;Disphragis proba;Acrosemia vulpecularia;Melanolophia sp_san_luis;Glena mopsaria;Gn sp_san_luis_1023;Gn sp_san_luis_1023;Gn sp_san_luis_1001;Gn sp_san_luis_1003;Bagisara laverna;Clemensia leopardina;Conchylodes erinalis;Gn sp_san_luis_1001;Bleptina caradrinalis;Gn sp_san_luis_1001;Nephodia auxesia;Leucula meganira;Nola sp_san_luis_a;Thysanopyga amarantha;Eois sp_san_luis_e;Orthofidonia sp_san_luis_a;Isogona continua--natatrix;Oospila venezuelata;Manduca lucetius;Manduca lucetius;Rhabdatomis laudamia;Gn sp_san_luis_7003;Oxydia trychiata;Idaea sp_group_san_luis;Stenoma byssina;Melanolophia sp_san_luis;Amorbia emigratella;Thysanopyga amarantha;Marimatha nigrofimbria;Iridopsis lurida--oberthuri--herse;Clepsis sp_san_luis_a;Sabulodes colombiata;Scopula sp_group_san_luis;Idalus crinis;Macrocneme iole;Urodus sp_san_luis_c;Hymenia perspectalis;Gonodonta paraequalis;Disphragis proba;Rhabdatomis laudamia;Ethmia exornata;Hymenia perspectalis;Gn sp_san_luis_2021;Quentalia subumbrata;Disphragis proba;Oospila venezuelata;Amorbia emigratella;Glena mopsaria;Herbita amicaria;Hylesia continua;Hylesia continua;Hylesia continua;Gn sp_san_luis_1023;Ethmia exornata;Gn sp_san_luis_1003;Psilosetia pura;Psilosetia pura;Psilosetia pura;Thysanopyga casperia;Glena mopsaria;Nola sp_san_luis_a;Herbita amicaria;Idaea sp_group_san_luis;Gn sp_san_luis_4001;Clemensia leopardina;Gn sp_san_luis_1004;Thysanopyga amarantha;Conchylodes erinalis;Disphragis proba;Semaeopus illimitata;Gn sp_san_luis_1010;Idaea sp_group_san_luis;Renia vinasalis;Euphyia sp_group_san_luis;Glena mopsaria;Josia sp_group_san_luis;Eois sp_san_luis_e;Stenoma byssina;Pyrgion repanda;Glena mopsaria;Cimicodes albicosta;Phyllodonta latrata;Lineodes sp_san_luis_a;Hymenia perspectalis;Acharia hyperoche;Conchylodes erinalis;Oxydia bilinea;Iridopsis lurida--oberthuri--herse;Hemiceras pernubila;Oxydia masthala;Isogona continua--natatrix;Amorbia emigratella;Cliniodes opalalis;Hypena livia;Cecharismena sp_san_luis_a;Hemiceras nigricosta;Glenopteris oculifera;Melanolophia sp_san_luis;Antaeotricha sp_san_luis_a;Leucanopsis longa;Anomis texana;Crambidia cephalica;Ascalapha odorata;Ascalapha odorata 20120623 33.9 -83.3$Renia flavipunctalis;Peridea basitriens;Peridea basitriens;Idaea obfusaria;Eulithis diversilineata;Clemensia albata;Acrolophus texanella;Chytonix palliatricula;Idia rotundalis;Spilosoma congrua;Spilosoma congrua;Spilosoma congrua;Parapediasia decorellus;Aethiophysa lentiflualis;Datana major;Datana major;Spodoptera ornithogalli;Cisthene packardii;Idaea obfusaria;Nigetia formosalis;Clemensia albata;Eutrapela clemataria;Ectropis crepuscularia;Heterocampa obliqua;Heterocampa obliqua;Baileya dormitans;Datana drexelii;Datana drexelii;Eulithis diversilineata;Crambidia pallida--uniformis;Arta statalis;Megalopyge opercularis;Megalopyge opercularis;Heterocampa obliqua;Isochaetes beutenmuelleri;Eudryas grata;Eudryas grata;Parasa chloris;Parasa chloris;Palthis asopialis;Apoda biguttata;Apoda biguttata;Paectes abrostoloides;Clemensia albata;Loxostegopsis merrickalis;Loxostegopsis merrickalis;Hypagyrtis esther;Idia julia;Microcrambus elegans;Megalopyge opercularis;Melanolophia signataria;Ectropis crepuscularia;Nigetia formosalis 20120624 10.3 -84.8$Nola sp_san_luis_a;Nola sp_san_luis_a;Gn sp_san_luis_2009;Amorbia emigratella;Eois sp_san_luis_b;Eupithecia sp_group_san_luis;Eusarca asteria--cayennaria;Gn sp_san_luis_1027;Glena mopsaria;Iridopsis validaria;Macaria carpo;Glena mopsaria;Campatonema lineata;Eusarca asteria--cayennaria;Ametris nitocris;Melanolophia sp_san_luis;Iridopsis lurida--oberthuri--herse;Thysanopyga carfinia;Conchylodes erinalis;Gn sp_san_luis_1004;Phyllodonta latrata;Gn sp_san_luis_1001;Nola sp_san_luis_a;Gn sp_san_luis_1001;Glena mopsaria;Gn sp_san_luis_1003;Euphyia sp_group_san_luis;Eois sp_san_luis_b;Euphyia sp_group_san_luis;Pareuchaetes insulata;Phyllodonta latrata;Glena mopsaria;Xylophanes porcus;Trygodes amphion;Synnomos sp_san_luis_a;Nola sp_san_luis_a;Eucereon aroa;Nephodia betala;Eusarca asteria--cayennaria;Thysanopyga amarantha;Gn sp_san_luis_1024;Amorbia emigratella;Conchylodes erinalis;Sphacelodes vulneraria;Sabulodes arge;Nola sp_san_luis_a;Gn sp_san_luis_1042;Euphyia sp_group_san_luis;Simopteryx torquataria;Clepsis sp_san_luis_a;Oospila venezuelata;Oxydia bilinea;Oospila venezuelata;Eusarca asteria--cayennaria;Anomis texana;Eusarca crameraria--brown;Nola sp_san_luis_a;Melanolophia sp_san_luis;Nola sp_san_luis_a;Melanolophia sp_san_luis;Gn sp_san_luis_8097;Gn sp_san_luis_1001;Nola sp_san_luis_a;Semaeopus viridiplaga;Gn sp_san_luis_1023;Iridopsis lurida--oberthuri--herse;Isochromodes caleta;Isochromodes caleta;Antaeotricha sp_san_luis_d;Crambidia myrlosea;Oospila venezuelata;Gn sp_san_luis_8032;Phyllodonta latrata;Amorbia emigratella;Oospila venezuelata;Gn sp_san_luis_2022;Glena mopsaria;Oospila venezuelata;Glena mopsaria;Euphyia sp_group_san_luis;Gn sp_san_luis_1001;Macaria carpo;Hemiceras pernubila;Gn sp_san_luis_1003;Pyrgion repanda;Gn sp_san_luis_1002;Gn sp_san_luis_1023;Opharus rudis;Tricentrogyna vinacea;Trygodes amphion;Anticarsia gemmatalis;Eois sp_san_luis_b;Glena mopsaria;Euphyia sp_group_san_luis;Synnomos sp_san_luis_a;Nola sp_san_luis_a;Glena mopsaria;Trygodes amphion;Lobocleta tenellata;Gn sp_san_luis_7001;Thysanopyga amarantha;Synnomos sp_san_luis_a;Gn sp_san_luis_1002;Gn sp_san_luis_1002;Gn sp_san_luis_1002;Eusarca asteria--cayennaria;Cratoptera zarumata;Sabulodes colombiata;Synnomos sp_san_luis_a;Stenoma byssina;Sanys irrosea;Gn sp_san_luis_1001;Gn sp_san_luis_1011;Euphyia sp_group_san_luis;Sanys irrosea;Herbita aglausaria;Eois sp_san_luis_b;Bagisara laverna;Anticarsia gemmatalis;Melanolophia sp_san_luis;Eois sp_san_luis_e;Synnomos urota;Gn sp_san_luis_1003;Iridopsis lurida--oberthuri--herse;Gn sp_san_luis_1002;Gn sp_san_luis_1003;Eusarca asteria--cayennaria;Gn sp_san_luis_8097;Opharus rudis;Thysanopyga amarantha;Scopula sp_group_san_luis;Gn sp_san_luis_1042;Idaea sp_group_san_luis;Eusarca asteria--cayennaria;Pareuchaetes insulata;Iridopsis validaria;Glena mopsaria;Gn sp_san_luis_1003;Herminocala sabata;Gizma undilinealis;Nola sp_san_luis_a;Herminocala sabata;Josia sp_group_san_luis;Melanolophia sp_san_luis;Gn sp_san_luis_7001;Polypoetes villia;Gn sp_san_luis_2006;Gn sp_san_luis_8073;Nola sp_san_luis_a;Eois sp_san_luis_b;Glena mopsaria;Gn sp_san_luis_7001;Melanolophia sp_san_luis;Gn sp_san_luis_1004;Idaea sp_group_san_luis;Nola sp_san_luis_a;Eusarca sp_san_luis_a;Gn sp_san_luis_1002;Gn sp_san_luis_8005;Idaea sp_group_san_luis;Dichomeris arotrosema;Simopteryx torquataria;Idaea sp_group_san_luis;Nola sp_san_luis_a;Thysanopyga amarantha;Simopteryx torquataria;Thysanopyga amarantha;Gn sp_san_luis_1023;Trygodes amphion;Eois sp_san_luis_b;Semaeopus viridiplaga;Euphyia sp_group_san_luis;Hemiceras pernubila;Elaphria sp_san_luis_a;Rivula leucosticta;Scopula sp_group_san_luis;Thysanopyga amarantha;Gn sp_san_luis_7001;Semaeopus viridiplaga;Dichomeris arotrosema;Melanolophia sp_san_luis;Epimecis semicompleta;Phrygionis platinata;Gn sp_san_luis_1005;Opharus rudis;Melanolophia sp_san_luis;Pachylia syces;Pachylia syces;Pachylia syces;Pachylia syces;Pachylia syces;Thysanopyga amarantha;Eois sp_san_luis_b;Semaeopus sp_san_luis_a;Bertholdia specularis;Melese monima;Rhabdatomis laudamia;Tosale aucta;Phostria tedea;Ischnurges eudamidasalis;Semaeopus viridiplaga;Trygodes amphion;Opisthoxia miletia;Eusarca melenda;Conchylodes erinalis;Conchylodes erinalis;Glena mopsaria;Melanolophia sp_san_luis;Hymenia perspectalis;Tosale aucta;Gn sp_san_luis_7001;Euphyia sp_group_san_luis;Gn sp_san_luis_8001;Scopula sp_group_san_luis;Glena mopsaria;Amorbia emigratella;Eois sp_san_luis_f;Amorbia emigratella;Amorbia emigratella;Hemerophila gradella;Amorbia emigratella;Givira lineaeplena;Rhabdatomis laudamia;Hypena andraca;Macaria carpo;Clepsis sp_san_luis_a;Euphyia sp_group_san_luis;Gn sp_san_luis_2020;Clepsis sp_san_luis_a;Clepsis sp_san_luis_a;Disphragis proba;Cliniodes opalalis;Gn sp_san_luis_7003;Hymenia perspectalis;Micrathetis dasarada;Iridopsis lurida--oberthuri--herse;Diphthera festiva;Gn sp_san_luis_1001;Thysanopyga amarantha;Desmia bajulalis;Idaea sp_group_san_luis;Euphyia sp_group_san_luis;Conchylodes concinnalis;Hymenia perspectalis;Euphyia sp_group_san_luis;Tetanolita mynesalis;Pachylioides resumens;Cratoptera zarumata;Iridopsis lurida--oberthuri--herse;Glena mopsaria;Euphyia sp_group_san_luis;Gn sp_san_luis_2003;Udea rubigalis;Cimicodes albicosta;Euphyia sp_group_san_luis;Renodes curviluna;Trygodes amphion;Iridopsis lurida--oberthuri--herse;Lobocleta tenellata;Iridopsis lurida--oberthuri--herse;Gn sp_san_luis_1001;Oospila venezuelata;Xenosoma nigromarginatum;Eucereon relegata;Iridopsis lurida--oberthuri--herse;Renodes curviluna;Macaria approximaria--gambarina--ostia;Glena mopsaria;Lineodes sp_san_luis_a;Cosmosoma impar;Eucereon tigrata;Eusarca asteria--cayennaria;Sabulodes colombiata;Pseudodirphia menander;Hymenia perspectalis;Eucereon tigrata;Sabulodes colombiata;Oospila venezuelata;Pyrgion repanda;Ischnurges eudamidasalis 20120624 33.9 -83.3$Acrolophus popeanella;Feltia subterranea;Acrolophus texanella;Acrolophus texanella;Neodactria luteolella;Neodactria luteolella;Renia flavipunctalis;Lithacodes fasciola;Bleptina inferior;Amydria effrentella;Cisthene packardii;Baileya ophthalmica;Protoboarmia porcelaria;Glyphidocera lactiflosella;Clemensia albata;Spilosoma congrua;Spilosoma congrua;Spilosoma congrua;Rhyacionia rigidana;Diatraea lisetta'

CR_query=re.compile(r'(\d\d\d\d\d\d\d\d\s)10.3\s-84.8\$(.*?)\d\d\d\d\d\d\d\d')

x=re.findall(CR_query,s)


d=[i[0] for i in x]
print "d", d

c=[i[1] for i in x]
print "c", c
print "len(c)", len(c)

Try using the regex: 尝试使用正则表达式:

(\d{8}\s)10\.3\s-84\.8\$(.*?)(?=\d{8}|$)

Your current regex was preventing successive matches because of matching overlaps; 您当前的正则表达式由于匹配重叠而阻止了连续匹配; I removed this by using a positive lookahead (?= ... ) which is a zero-width assertion. 我通过使用正宽度(?= ... ) (零宽度断言(?= ... )消除了此问题。

Also, I escaped your periods. 另外,我逃脱了你的任期。 Those should be escaped if intended to be literal dots. 如果要用作文字点,则应将其转义。

regex101 demo regex101演示

It doesn't look like your regex will match your string. 它看起来不像您的正则表达式将匹配您的字符串。

Your regex looks for 8 digits, followed by "10.3 -84.8$" and then captures everything and then 8 more digits afterwards. 您的正则表达式将查找8位数字,后跟“ 10.3 -84.8 $”,然后捕获所有内容,然后再捕获8位数字。 However, it doesn't look like your string has those 8 digits anywhere else. 但是,看起来您的字符串在其他任何地方都不具有这8位数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM