简体   繁体   中英

What is the difference between XML data and XML metadata?

I'm rebuilding some XML feeds, so I am researching when to use elements and when to use attributes with XML.

Several sites have said "Data goes in elements, metadata in attributes."

So, what is the difference between the two?

Let's take an example from W3Schools :

<note date="12/11/2002">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

Should the date stay as an attribute of the note element? Or does it make more sense to go into its own element?

<date>12/11/2002</date>

Or, does it make sense for it to be separated into multiple elements?

<date>
  <day>12</day>
  <month>11</month>
  <year>2002</year>
</date>

Following the "Data goes in elements, metadata in attributes.", I would have made the Date a child element. You don't need to break it down into day, month, and year, because I think there's actually a way to specify in an XSD that an element must be a Date type. I think an example of "metadata" here would be a noteID field or maybe a noteType . Example:

<note id="NID0001234" type="reminder">
  <date>2002-11-12</date>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

UPDATE: As many others have pointed out, it can be rather subjective. I try to separate the two by how they will be used. Data will usually be presented to the user, metadata will control the presentation and may be used internally for other purposes. But there are always exceptions...

The distinction between data and metadata is almost entirely subjective. One man's data is another's metadata. The "metadata in attributes" rule grew out of the markup world, where a rule of thumb was, if you remove all of the markup, and just leave the text, it should be a reasonable document. This meant attributes should be discardable, and elements essential. If you display XML in an uncomprehending browser, it will be treated this way.

But your XML (and most XML these days) likely won't be displayed to the user in an uncomprehending browser, so you can use better rules for how to design your XML.

For example, you can have multiple elements with the same name, but not multiple attributes. And whitespace is ignored in attributes, but not in elements.

There are differing views on the principles to use when deciding whether to use an attribute or an element for a piece of data. For example, see this old article from IBM , which lays out a bunch of proposed principles, and then decorates the whole article with a giant caveat that says "there are lots of exceptions and these principles are not intended to be prescriptive" (essentially).

I think the main thing is to be internally consistent. Be consistent within your own world, however large that is. Your "world" could be a single schema - in which you should be consistent in your approach. Every element within that schema should be philosophically consistent. Or your world could be a set of related schema, or it could be all XML documents emitted by a particular company, or even all XML schema used by an industry or technology group.

Now, regarding the sample you offered:

<note date="12/11/2002">  
  <to>Tove</to>  
  <from>Jani</from>  
  <heading>Reminder</heading>  
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

...this seems internally inconsistent because only one piece of data is factored out, and there doesn't seem to be a good reason to do so.

Better if all the items were attributes or all were elements. One exception: the longish body element should probably always be an element. This feels right to me:

<note date="12/11/2002" to="Tove" from="Jani" heading="Reminder">
  <body>Don't forget me this weekend! Remember what happenned last time you forgot!!!</body>  
</note>  

Putting the body into an attribute hurts readability, and that recommends putting the body into an element.

Keep in mind that whitespace can be collapsed in attribute values (source: that IBM article I cited); the hard rule that arises from that, is that if whitespace is meaningful, then you should use an element.

Now, if the heading in that fragment of xml is something like an email subject, I'd probably factor that out into an element as well, since subjects can be lengthy.

As for your question regarding the month/day/year of the date, yes, factor those things out if you need easy access to these individual data in tools that process the XML. It's easier to search for all notes from before 2009 with an xpath statement that does not have to do string parsing and then string-to-number conversion, if you see what I mean. On the other hand if your use of the XML does not require you to do selects or searches on those individual data (month, day, year), then keep them consolidated into a human-readable form as in your original.


tl;dr: There are few firm rules. As long as your use of elements and attributes is consistent, it will be easy for other developers and tools to understand and use.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM