简体   繁体   中英

Find text from html with matches with regex pattern

I want a regex solution to find some text value which looks like MLA818214667 and this value placed in a id like id="MLA818214667" . There should be 3 type of pattern to find these value from HTML.

  1. It should start with MLA and placed in id="" .
  2. The number after MLA should be more than 6 characters long.
  3. The number should be fully numeric not string mixed.

Note: I want to avoid HtmlAgilityPack for this case because the text not always valid html. So i want to treat it as text not html and need solution without any html parser

C#:

var listOfIds = new List<string>();
string html = @"below html sample goes here";

Match match = Regex.Match(input, @"/([A-Za-z0-9\-]+)\.$",
            RegexOptions.IgnoreCase);

//from matched ids it should be added in list listOfIds 

Html:

<span class="main-title">
  Casco Integral Halcon H57 + Combo Termico Invierno Sti Motos
</span>
</h2>
<div class="item__status">
  <div class="item__condition">541 vendidos</div>
</div>
</div>
</a>
<form class="item__bookmark-form" action="/search/bookmarks/MLA614364106/make" method="post" id="bookmarkForm" class="bookmark-form">
  <button type="submit" class="bookmarks favorite" data-id="MLA614364106">
    <div class="item__bookmark">
      <div class="icon"></div>
    </div>
  </button>
  <input type="hidden" name="method" value='add'/>
  <input type="hidden" name="itemId" value='MLA614364106'/>
  <input type="hidden" name="_csrf" value="5fe7b4e6-19d3-42bc-a3bb-15eaeee81f64"/>
</form>
</div>
</li>
<li class="results-item highlighted article grid item-info-height-179">
  <div class="rowItem item highlighted item--grid item--has-row-logo new" id="MLA751765547">
    <div class="item__image item__image--grid">
      <div class="images-viewer" item-url="https://articulo.mercadolibre.com.ar/MLA-751765547-casco-moto-hawk-htl-dr46-rebatible-lett-store-_JM#position=5&amp;type=item&amp;tracking_id=897c653e-1565-4371-8a4d-b2ea29d09d4d" item-id="MLA751765547">
        <div class="carousel">
          <ul>
            <li><a href="https://articulo.mercadolibre.com.ar/MLA-751765547-casco-moto-hawk-htl-dr46-rebatible-lett-store-_JM#position=5&amp;type=item&amp;tracking_id=897c653e-1565-4371-8a4d-b2ea29d09d4d" class="item-link item__js-link">
                  <img class='lazy-load' width='284' height='284' alt='Casco Moto Hawk Htl Dr46 Rebatible Lett Store' src='https://http2.mlstatic.com/casco-moto-hawk-htl-dr46-rebatible-lett-store-D_NQ_NP_624166-MLA31021954439_062019-W.jpg'/>
                </a>
            </li>
          </ul>
        </div>
      </div>
    </div>
    <span class="item-loading__status-bar item-loading__hide"></span>
    <a href="https://articulo.mercadolibre.com.ar/MLA-751765547-casco-moto-hawk-htl-dr46-rebatible-lett-store-_JM#position=5&amp;type=item&amp;tracking_id=897c653e-1565-4371-8a4d-b2ea29d09d4d" class="item__info-link item__js-link">
      <div class="item__info ">
        <div class="item__price ">
          <span class="price__symbol">$</span>
          <span class="price__fraction">3.725</span>
        </div>
        <span class="item-installments item__installments--show-card-icon highlighted free-interest item--has-shipping">
          <span class="item-installments-text">Hasta 6 cuotas sin inter&eacute;s</span>
        </span>
        <div class="item__shipping-promise item__shipping highlighted free-shipping">
          <span class="text-shipping next_day">Llega gratis el lunes</span>
        </div>
        <div class="item__brand-logo item__brand-img--ultra-wide">
          <span class="item__brand-img-container">
            <img src="https://http2.mlstatic.com/D_NQ_NP_796276-MLA31050681849_062019-T.jpg"/>
          </span>
        </div>
        <h2 class="item__title list-view-item-title">
          <span class="main-title">Casco Moto Hawk Htl Dr46 Rebatible Lett Store</span>
        </h2>
        <div class="item__status">
          <div class="item__condition">362 vendidos</div>
        </div>
      </div>
    </a>
    <form class="item__bookmark-form" action="/search/bookmarks/MLA751765547/make" method="post" id="bookmarkForm" class="bookmark-form">
      <button type="submit" class="bookmarks favorite" data-id="MLA751765547">
        <div class="item__bookmark">
          <div class="icon"></div>
        </div>
      </button>
      <input type="hidden" name="method" value='add'/>
      <input type="hidden" name="itemId" value='MLA751765547'/>
      <input type="hidden" name="_csrf" value="5fe7b4e6-19d3-42bc-a3bb-15eaeee81f64"/>
    </form>
  </div>
</li>
<li class="results-item highlighted article grid item-info-height-179">
  <div class="rowItem item highlighted item--grid item--has-row-logo new to-item" id="MLA817988063">
    <div class="item__image item__image--grid">
      <div class="images-viewer" item-url="https://articulo.mercadolibre.com.ar/MLA-817988063-cascos-motos-vega-vflow-motocross-mx-enduro-atv-acces-cam-_JM#position=6&amp;type=item&amp;tracking_id=897c653e-1565-4371-8a4d-b2ea29d09d4d" item-id="MLA817988063">
        <div class="carousel">
          <ul>
            <li>
              <a href="https://articulo.mercadolibre.com.ar/MLA-817988063-cascos-motos-vega-vflow-motocross-mx-enduro-atv-acces-cam-_JM#position=6&amp;type=item&amp;tracking_id=897c653e-1565-4371-8a4d-b2ea29d09d4d" class="item-link item__js-link">
                <img class='lazy-load' width='284' height='284' alt='Cascos Motos Vega Vflow Motocross Mx Enduro Atv + Acces Cam' src='https://http2.mlstatic.com/cascos-motos-vega-vflow-motocross-mx-enduro-atv-acces-cam-D_NQ_NP_629038-MLA32405702773_102019-W.jpg' />
              </a>
            </li>
          </ul>
        </div>

You can use this example "id=\"(MLA[0-9]{6,})\"" to find all the values of id form HTML

Paste the RegEx in here https://regex101.com to see how it works

 static void Main(string[] args)
    {
        var listOfIds = new List<string>();
        string html = " id=\"MLA12334566\"  id=\"MLA123354566\" id=\"MLA123346566\"";

        Regex idRegex = new Regex("id=\"(MLA[0-9]{6,})\"");

        var matches = idRegex.Matches(html);

        foreach(var match in matches)
        {
            listOfIds.Add(match.ToString());
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM