Opened 11 years ago

Closed 7 years ago

#54 closed defect (fixed)

Questions/Issues in connection with UCUM v1.8.2

Reported by: Hans Jonkers Owned by: Gunther Schadow
Priority: minor Milestone: Version 2.0
Component: Keywords:
Cc: hans.jonkers@…

Description

During the development and testing of our UCUM implementation (based on version 1.8.2 of the standard) we came across a number of issues related to the interpretation of the standard and inconsistencies with e.g. the UCUM functional test set and the Regenstrief conversion tool. I would like to report these issues to you here; see the attached PDF file.

Change History (10)

comment:1 Changed 11 years ago by Hans Jonkers

Priority: blockerminor

comment:2 Changed 11 years ago by Hans Jonkers

My attempts to attach the PDF file referred to above to the ticket failed because of an internal server error and attempts to report that error failed for the same reason. The textual contents of the PDF file is therefore reproduced below:

  1. According to the standard:

"All expressions of The Unified Code for Units of Measure shall be built from characters of the 7-bit US-ASCII character set exclusively."

I would reckon that this applies to annotations also, but in the UCUM functional tests file (UcumFunctionalTests?.xml) there are examples of annotations containing Unicode characters that are claimed to be valid. Are Unicode characters allowed inside annotations?

  1. The rules with respect to the use of whitespace in terms are not completely clear. It is clear that whitespace may not occur inside terminal symbols:

"Terminal unit symbols can consist of all ASCII characters in the range of 33–126 (0x21–0x7E) excluding …"

But may it occur anywhere else in a term? If so, does whitespace act as a separator? I would expect that a number is also a terminal symbol, but according to the Regenstrief conversion tool spaces are allowed in numbers, e.g. "3 2" is interpreted as 32 and not as 32.

  1. The BNF syntax for terms is somewhat confusing because of the following two paragraphs in the standard:

§8 integer numbers A positive integer number may appear in place of a simple unit symbol.

§10 nested terms  Unit terms with operators may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

I would expect that these two rules would have been incorporated in the BNF syntax, i.e. that <simple-unit> and <component> would have been defined like this:

<simple-unit> ::= <ATOM-SYMBOL>

| <PREFIX-SYMBOL><ATOM-SYMBOL> | <factor> | “(”<term>“)”

<component> ::= <annotatable><annotation>

| <annotatable> | <annotation>

This syntax includes "(3)2" as a proper unit term, although the rule from §10 of the standard strictly implies that the expression between parentheses should contain operators. The Regenstrief conversion tool accepts "(3)2" which evaluates to 9. So, should the first line of §10 not read as follows?

§10 nested terms  Unit terms may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

  1. The names of the conversion functions used in the "ucum-essence.xml" file are not always consistent with the names used in the html definition of the standard (e.g. "logTimes2" instead of "2lg").
  1. The first conversion function from the function pair "degf(5 K/9)" as declared in "ucum-essence.xml" seems to have the following definition (due to the scale factor in the declaration of the function): fF(x) = x - 459.67 rather than fF(x) = 9/5 x - 459.67. Or should "degf(5 K/9)" be "degf(1 K)"?
  1. In the file "ucum-essence.xml", there is an inconsistency in the values of the "Unit" attributes of the following two units (one is specified as "rad" and the other as "deg"):

<unit Code="[p'diop]" CODE="[P'DIOP]" isMetric="no" isSpecial="yes" class="clinical">

<name>prism diopter</name> <printSymbol>PD</printSymbol> <property>refraction of a prism</property> <value Unit="100tan(1 rad)" UNIT="100TAN(1 RAD)">

<function name="tanTimes100" value="1" Unit="deg"/>

</value>

</unit> <unit Code="%[slope]" CODE="%[SLOPE]" isMetric="no" isSpecial="yes" class="clinical">

<name>percent of slope</name> <printSymbol>%</printSymbol> <property>slope</property> <value Unit="100tan(1 rad)" UNIT="100TAN(1 RAD)">

<function name="100tan" value="1" Unit="deg"/>

</value>

</unit>

  1. The canonical form of a term containing an arbitrary unit is defined in our tool as the term itself. This is based on the following paragraph from the standard:

§25 operations on arbitrary units Any term involving arbitrary units, is itself an arbitrary unit and is not comparable with any other arbitrary unit or term.

In the Regenstrief conversion tool, however, terms containing arbitrary units are evaluated where arbitrary units are treated as dimensionless units. Is this due to the tool being based on an older version of the standard?

  1. In the ucum-essence.xml file the value of arbitrary units is still defined like this:

<value Unit="1" UNIT="1" value="1">

  1. According to the standard and the functional tests file, "[IU]" is not a valid unit. So why is it contained in table 19? It is also contained in the ucum-essence.xml file, but there is no way to deduce that it is an invalid code.
  1. It would be nice if there was an XML schema for the ucum-essence.xml file.
  1. Some typos/errors in Table 26 (Example Unit Terms by Term):
  1. In the row with unit term = "g/(kg.d)" in , the cells 3-7 have shifted one place to the right.
  2. Several errors in the canonical forms:

○ Canonical form value of "/kg{body'wt}" should be .001 and not 1000. ○ Canonical form values such as 1015, 1018, etc. should be 1015, 1018, etc. ○ Canonical form of "dB" is not correct (1 is the canonical form of "0.dB").

  1. In the UcumFunctionalTest?.xml file the following strings are claimed to be valid units, which to my opinion is not correct:
  1. "" (the empty string)
  2. "{錠}rad2{錠}" (even with no Unicode characters "{…}rad2{…}" is not a valid unit)

Is it right that these are not valid units?

  1. There are several examples where the Regenstrief conversion tool seems to be inconsistent with the standard, e.g. "2{spoon}" is not allowed, while "cl{spoon}" is allowed, which is inconsistent with §8 of the standard. I assume, as already expressed above, that this is due to the fact that the tool is based on another version of the standard.

I hope these remarks will help in further improving the quality of the standard.

comment:3 Changed 10 years ago by Gunther Schadow

Milestone: Revision 1.9
Owner: set to Gunther Schadow
Status: newassigned

Thank you for these important observations. We do need to take care of them. This is also related to #4, long waiting for resolution. We should take this all up for release 1.9 (or 2.0)

comment:4 Changed 10 years ago by Gunther Schadow

Specific comments:

  • the functional test cases appear wrong at this time. UNICODE is not allowed in unit symbols, but we could allow them in curly braces, because curly braces have no meaning given by UCUM.
  • interpreting 3-space-2 as 32 seems far fetched, on the other hand, spaces should not occur at all, ever. We could outlaw it altogether or -- as it was interpreted in the Regenstrief Unit Conversion implementation could ignore spaces. Using space as separators invites ambiguity and error, I would not recommend it.

comment:5 Changed 7 years ago by Gunther Schadow

Milestone: Revision 1.9

deferred for next release as far as #4 issues are concerned.

comment:6 Changed 7 years ago by Gunther Schadow

Milestone: Version 2.0

comment:7 in reply to:  2 Changed 7 years ago by Gunther Schadow

Replying to hans.jonkers:

  1. According to the standard:

"All expressions of The Unified Code for Units of Measure shall be built from characters of the 7-bit US-ASCII character set exclusively."

I would reckon that this applies to annotations also, but in the UCUM functional tests file (UcumFunctionalTests?.xml) there are examples of annotations containing Unicode characters that are claimed to be valid. Are Unicode characters allowed inside annotations?

This has been resolved by Grahame Grieve uploading a new version of these test cases.

  1. The rules with respect to the use of whitespace in terms are not completely clear. It is clear that whitespace may not occur inside terminal symbols:

"Terminal unit symbols can consist of all ASCII characters in the range of 33–126 (0x21–0x7E) excluding …"

But may it occur anywhere else in a term? If so, does whitespace act as a separator? I would expect that a number is also a terminal symbol, but according to the Regenstrief conversion tool spaces are allowed in numbers, e.g. "3 2" is interpreted as 32 and not as 32.

White space is not otherwise recognized. It should probably be handled as an error rather than ignored. It is intentionally not used as a separator. Added as comment:

			<p>
White space is not recognized in a a unit term and should generally 
not occur. UCUM implementations may flag whitespace as an error 
rather than ignore it. Whitespace is not used as a separator of 
otherwise ambiguous parts of a unit term.
			</p>
  1. The BNF syntax for terms is somewhat confusing because of the following two paragraphs in the standard:

§8 integer numbers A positive integer number may appear in place of a simple unit symbol.

This is where <factor> alone is in the definition of <simple-unit>.

§10 nested terms  Unit terms with operators may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

See #158 for detail resolution.

  1. The names of the conversion functions used in the "ucum-essence.xml" file are not always consistent with the names used in the html definition of the standard (e.g. "logTimes2" instead of "2lg").

That is currently not considered a defect, its the difference between print name and computer language identifier.

  1. The first conversion function from the function pair "degf(5 K/9)" as declared in "ucum-essence.xml" seems to have the following definition (due to the scale factor in the declaration of the function): fF(x) = x - 459.67 rather than fF(x) = 9/5 x - 459.67. Or should "degf(5 K/9)" be "degf(1 K)"?

Hm, good point. we will need to check that. TODO.

  1. In the file "ucum-essence.xml", there is an inconsistency in the values of the "Unit" attributes of the following two units (one is specified as "rad" and the other as "deg"):

<unit Code="[p'diop]" CODE="[P'DIOP]" isMetric="no" isSpecial="yes" class="clinical">

<name>prism diopter</name> <printSymbol>PD</printSymbol> <property>refraction of a prism</property> <value Unit="100tan(1 rad)" UNIT="100TAN(1 RAD)">

<function name="tanTimes100" value="1" Unit="deg"/>

</value>

</unit> <unit Code="%[slope]" CODE="%[SLOPE]" isMetric="no" isSpecial="yes" class="clinical">

<name>percent of slope</name> <printSymbol>%</printSymbol> <property>slope</property> <value Unit="100tan(1 rad)" UNIT="100TAN(1 RAD)">

<function name="100tan" value="1" Unit="deg"/>

</value>

</unit>

Hm, good point as well. We will need to check that. TODO.

  1. The canonical form of a term containing an arbitrary unit is defined in our tool as the term itself. This is based on the following paragraph from the standard:

§25 operations on arbitrary units Any term involving arbitrary units, is itself an arbitrary unit and is not comparable with any other arbitrary unit or term.

It would not be quite right, though. Consider "[BAU]/cm3" vs. "[BAU]/mL" they are still equal.

In the Regenstrief conversion tool, however, terms containing arbitrary units are evaluated where arbitrary units are treated as dimensionless units. Is this due to the tool being based on an older version of the standard?

The Regenstrief tool is old, not maintained and may have disappeared. It would not know of that newer rule.

  1. In the ucum-essence.xml file the value of arbitrary units is still defined like this:

<value Unit="1" UNIT="1" value="1">

But it does have the isArbitrary flag (at least it does now, may have been an old resolved problem.)

  1. According to the standard and the functional tests file, "[IU]" is not a valid unit. So why is it contained in table 19? It is also contained in the ucum-essence.xml file, but there is no way to deduce that it is an invalid code.

This may have been an older issue. [IU] upper case was more recently added.

  1. It would be nice if there was an XML schema for the ucum-essence.xml file.

But we don't have one nor intend to add one. In a world were people now move to JSON everywhere schema seems to have fallen in importance. This developer rarely needed them.

  1. Some typos/errors in Table 26 (Example Unit Terms by Term):
  1. In the row with unit term = "g/(kg.d)" in , the cells 3-7 have shifted one place to the right.

fixed

  1. Several errors in the canonical forms:

○ Canonical form value of "/kg{body'wt}" should be .001 and not 1000.

fixed

○ Canonical form values such as 1015, 1018, etc. should be 1015, 1018, etc.

fixed

○ Canonical form of "dB" is not correct (1 is the canonical form of "0.dB").

removed this entry for the time being

  1. In the UcumFunctionalTest?.xml file the following strings are claimed to be valid units, which to my opinion is not correct:

see above, these have been updated.

  1. There are several examples where the Regenstrief conversion tool seems to be inconsistent with the standard, e.g. "2{spoon}" is not allowed, while "cl{spoon}" is allowed, which is inconsistent with §8 of the standard. I assume, as already expressed above, that this is due to the fact that the tool is based on another version of the standard.

see above, Regenstrief tool is largely out of date / defunct.

I hope these remarks will help in further improving the quality of the standard.

thank you. And I am sorry it took so long to really mull them over.

comment:8 Changed 7 years ago by Gunther Schadow

So, what is still TODO is looking into the function pair definitions. It is probably just a cosmetical issue. I see in one of my implementations, the degF function includes the 5 and 9, and it would seem they really have to. So why do we even write it on the function arguments?

comment:9 Changed 7 years ago by Gunther Schadow

More checking. The function pair for degF that I implement is this:

pair degF {
  REAL ABSOLUTE := -273.15 * 9 / 5 + 32;  // = -459.67
  f_to  (REAL x) { return x + ABSOLUTE; }
  f_from(REAL x) { return x - ABSOLUTE; }
}

So all I am doing in the function is the subtraction. The multiplication and division of the value still needs to be done when applying the function.

comment:10 Changed 7 years ago by Gunther Schadow

Resolution: fixed
Status: assignedclosed

It is not an error, because we say the function pair "degf(5 K/9)" is defined as fF(x) = 9/5 x - 459.67, the function pair's name is "degf(5 K/9) with all these numbers. So the 5 and 9 stuff is included in what we define.

Note: See TracTickets for help on using tickets.