简体   繁体   English

我怎样才能使正则表达式起作用?

[英]How can I get the regex to work?

I want to use a regex to find the parent node of the 16 digit number and return that whole section, but can't figure out how, so given: 我想使用正则表达式来查找16位数字的父节点并返回整个部分,但无法弄清楚如何,因此给出:

<Details>
<CreditCard cardnum="1234567890123456" ccv="123" exp="0212" cardType="1" name="joe" />
</Details>

I want to return: 我想退货:

<CreditCard cardnum="1234567890123456" ccv="123" exp="0212" cardType="1" name="joe" />

I then am going to use parse the xml and get every attribute that is a number and remove it. 然后,我将使用解析xml并获取每个数字的属性并将其删除。

I tried .*(\\d{13,16}).* , but this gets every character. 我尝试了.*(\\d{13,16}).* ,但这可以获取每个字符。

Once, I do: 一次,我做:

XElement element = XElement.Parse(xml); // XDocument.Load(xmlFile).Root

IEnumerable<XElement> elementsWithPossibleCCNumbers = 
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length == 16)
                            .Count() == 1); 

I can't figure out how to loop through each attribute in elementsWithPossibleCCNumbers, for example: 我无法弄清楚如何遍历elementsWithPossibleCCNumbers中的每个属性,例如:

foreach(var x in elementsWithPossibleCCNumbers)
{
//If attribute is number, replace value with empty string
}

Note: I removed the int.TryParse for now. 注意:我现在删除了int.TryParse。

I decided to do this: 我决定这样做:

IEnumerable<XElement> elementsWithPossibleCCNumbers = 
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length >= 13 && a.Value.Length <= 16)
                            .Count() == 1).Select(x=>x);


foreach(var x in elementsWithPossibleCCNumbers)
{
   foreach(var a in x.Attributes())
   {

   xml = xml.Replace(a.Value, new String('*',12));
   }
}

However, if I have a second element with an attribute of 16 digits, it only replaces part of the attributes value. 但是,如果第二个元素的属性为16位数字,则它仅替换部分属性值。

I wrote up another method to try out. 我写了另一种方法来尝试。 The regex now only verifies the attribute value and not the XML itself. 现在,正则表达式仅验证属性值,而不验证XML本身。 I have no idea what you're looking to return out from this method but this will at least get you started on not using Regex for XML. 我不知道您希望从此方法中得到什么,但这至少可以使您开始不使用Regex for XML。

[Test]
public void X()
{
    const string xml = "<Details><CreditCard cardnum=\"1234567890123456\" ccv=\"123\" exp=\"0212\" cardType=\"1\" name=\"joe\" /><donotfind>333</donotfind></Details>";

    var doc = new XmlDocument();
    doc.LoadXml(xml);

    Console.WriteLine(doc.Name);;

    foreach(XmlNode x in doc.ChildNodes)
    {
        ExploreNode(x);
    }
}

void ExploreNode(XmlNode node)
{
    Console.WriteLine(node.Name);

    if (node.Attributes != null)
    {
        foreach (XmlAttribute attr in node.Attributes)
        {
            Console.WriteLine("\t{0} -> {1}", attr.Name, attr.Value);

            if (attr.Value.Length == 16 && Regex.IsMatch(attr.Value, @"\d{16}"))
            {
                Console.WriteLine("\t\tCredit Card # found!");
            }
        }
    }

    foreach (XmlNode child in node.ChildNodes)
    {
        ExploreNode(child);
    }
}

Since your XML can vary a great deal, I would do something like the following. 由于您的XML可能有很大的不同,因此我将执行以下操作。

Assuming XML like: 假设XML像这样:

<Details> 
<CreditCard cardnum="1234567890123456" 
            ccv="123" 
            exp="0212" 
            cardType="1" 
            name="joe" /> 
</Details> 

Agnostic-ish code: 不可知论的代码:

XElement element = XElement.Parse(xml); // XDocument.Load(xmlFile).Root
int ccNumber;

IEnumerable<XElement> elementsWithPossibleCCNumbers = 
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length == 16)
                            .Where(a => int.TryParse(a.Value, out ccNumber))
                            .FirstOrDefault() != null);

// Do not use ccNumber 
// Use elementsWithPossibleCCNumbers

This could be extended to include a number of attributes... 这可以扩展为包括许多属性...

IEnumerable<XElement> elementsWithPossibleCCNumbers =
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length == 16)
                            .Where(a => int.TryParse(a.Value, out ccNumber))
                            .FirstOrDefault() != null
                           && d.Attributes().Count() == 5);

There are a multitude of possibilities that don't include using Regex nor hard coding XML element names. 有许多可能性不包括使用Regex或对XML元素名称进行硬编码。 I tend to use Regex as a last resort, especially if there is something better that can parse all the data for me. 我倾向于将Regex作为最后的手段,尤其是如果有更好的东西可以为我解析所有数据时。

Update 1 更新1

elementsWithPossibleCCNumbers are XML Elements that contain 1 or MORE attributes that are 16 digits in length and are an integer. elementsWithPossibleCCNumbers是包含1或MORE属性的XML元素,这些属性的长度为16位,是整数。 That being the case, you can't tell so I would change it to.. 在这种情况下,您无法确定,所以我将其更改为..

IEnumerable<XElement> elementsWithPossibleCCNumbers = 
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length == 16)
                            .Where(a => int.TryParse(a.Value, out ccNumber))
                            .Count() == 1);  
                            // Where only 1 attribute is 16 length and an int

Extending it again... 再次扩展...

IEnumerable<XAttribute> attributesWithPossibleCCNumbers =
        element.Descendants()
               .Where(d => d.Attributes()
                            .Where(a => a.Value.Length == 16)
                            .Where(a => int.TryParse(a.Value, out ccNumber))
                            .Count() == 1)
               .Select(e => e.Attributes()
                             .Where(a => a.Value.Length == 16)
                             .Where(a => int.TryParse(a.Value, out ccNumber))
                             .First());

Try using: <[^>]+[0-9]{16}[^>]+> 尝试使用:<[^>] + [0-9] {16} [^>] +>

Edit: This might be more efficient- <([^>0-9]+)([0-9]{16})([^>]+)> 编辑:这可能更有效-<([^> 0-9] +)([0-9] {16})([^>] +)>

Don't use Regex to parse XML. 不要使用Regex解析XML。 It's not well suited to it. 它不太适合它。

How about using XmlDocument or XDocument instead? 如何改用XmlDocument或XDocument?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM