Opened 7 years ago

Closed 7 years ago

#158 closed defect (fixed)

Grammar specification is self-contradictory

Reported by: Ferdinand Beyer Owned by: Gunther Schadow
Priority: critical Milestone: Version 2.0
Component: Keywords: grammar specification
Cc:

Description (last modified by Gunther Schadow)

The specification at http://unitsofmeasure.org/ucum.html describes the grammar of UCUM expressions and should AFAIK be the reference for implementations.

Unfortunately, this specification is self-contradictory in some cases:

  • §10 (3) reads: "Since a unit term in parenthesis can be used in place of a simple unit, an exponent may follow on a closing parenthesis which raises the whole term within the parentheses to the power." This is not consistent with the Backus-Naur grammar, were parentheses form "components", not "simple units". Since there seems to be no use case for expressions such as "(m/s)2", I suggest to remove §10 (3) from the specification.
  • The description of the Backus-Naur grammar is out-of-date. It refers to the non-terminal "power" that does not exist, probably because of the inconsistency mentioned above.
  • The Backus-Naur grammar leads to right-to-left evaluation order. for example, the expression "a/b/c" will be parsed as:

[EDIT: I (GS) edited this parse tree to represent what I understand is the salient point.]

           term
       /   |       \
component  |        \
    |      |         \
    |      |           term
    |      |        /    |    \
    |      |  component  |    term
    |      |      |      |      |
    |      |      |      |  component
    |      |      |      |      |
   "a"    "/"    "b"    "/"    "c"

...and therefore interpreted as "a/(b/c)" instead of "(a/b)/c".

It is particularly unsatisfying that I found one case where the text wins over the formal grammar, and one where it is the other way around. With the specification in this form it is impossible to decide on the "UCUM conformance" of an implementation.

Change History (7)

comment:1 Changed 7 years ago by Gunther Schadow

Milestone: Version 2.0

Related are #4 and #54 it's time to resolve this.

comment:2 Changed 7 years ago by Gunther Schadow

Here is the salient complaint from #54

  1. The BNF syntax for terms is somewhat confusing because of the following two paragraphs in the standard:

§8 integer numbers A positive integer number may appear in place of a simple unit symbol.

§10 nested terms Unit terms with operators may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

I would expect that these two rules would have been incorporated in the BNF syntax, i.e. that <simple-unit> and <component> would have been defined like this:

<simple-unit> ::= <ATOM-SYMBOL>
                  | <PREFIX-SYMBOL><ATOM-SYMBOL> | <factor> | “(”<term>“)”

<component> ::= <annotatable><annotation>
                 | <annotatable> | <annotation>

This syntax includes "(3)2" as a proper unit term, although the rule from §10 of the standard strictly implies that the expression between parentheses should contain operators. The Regenstrief conversion tool accepts "(3)2" which evaluates to 9. So, should the first line of §10 not read as follows?

§10 nested terms Unit terms may be enclosed in parentheses (‘(’ and ‘)’) and used in place of simple units.

comment:3 Changed 7 years ago by Gunther Schadow

#4 has already been mostly resolved.

§7 algebraic unit terms ![...] ■3 The division operator can be used as a binary and unary operator, i.e. a leading solidus will invert the unit that directly follows it.

so /a.b/c is clear the same as 1/a.b/c or a-1.b.c-1.

The BNF seems not to reflect that:

<component> ::=	<annotatable><annotation>
            | <annotatable>
            | <annotation>
            | <factor>
            | “(”<term>“)”

<term> 	::= “/”<component>
        | <component>“.”<term>
        | <component>“/”<term>
        | <component>

And that was then resolved by adding main-term as a start symbol:

<sign>	::=	“+” | “-”
<digit>	::=	“0” | “1” | “2” | “3” | “4” | “5” | “6” | “7” | “8” | “9”
<digits>	::=	<digit><digits> | <digit>
<factor>	::=	<digits>
<exponent>	::=	<sign><digits> | <digits>
<simple-unit>	::=	<ATOM-SYMBOL>
                    | <PREFIX-SYMBOL><ATOM-SYMBOL>
<annotatable>	::=	<simple-unit><exponent>
                    | <simple-unit>
<component>	::=	<annotatable><annotation>
                      | <annotatable>
                      | <annotation>
                      | <factor>
                      | “(”<term>“)”
<term>	::=	<component>“.”<term>
              | <component>“/”<term>
              | <component>
<main-term>	::=	“/”<term>
                      | <term>
<annotation>	::=	“{”<ANNOTATION-STRING>“}”

comment:4 Changed 7 years ago by Gunther Schadow

Owner: set to Gunther Schadow
Status: newassigned

Let's not worry about the right-to-left association for a moment.

The observation that we are talking about "power" in the caption but not actually have it in the BNF is good. May be we should have it?

Exponent is definitely misplaced if we wanted to have that §10 (3)

<simple-unit>	::=	<ATOM-SYMBOL>
                    | <PREFIX-SYMBOL><ATOM-SYMBOL>
<annotatable>	::=	<simple-unit><exponent>
                    | <simple-unit>
<component>	::=	<annotatable><annotation>
                     | <annotatable>
                     | <annotation>
                     | <factor>
                     | “(”<term>“)”
<term>	::=	<component>“.”<term>
              | <component>“/”<term>
              | <component>
<main-term>	::=	“/”<term>
                      | <term>
<annotation>	::=	“{”<ANNOTATION-STRING>“}”

So the question is fair, should we just drop this rule? It is not being used anywhere currently, doubt that anyone understands it.

We need to make that an advisory and perhaps release this tentatively for public comments.

About the left to right association, this can be resolved easily:

<term>	::=	<term>“.”<component>
              | <term>“/”<component>
              | <component>

comment:5 Changed 7 years ago by Gunther Schadow

Description: modified (diff)

comment:6 Changed 7 years ago by Gunther Schadow

Strike:

<verse> Since a unit term in parenthesis can be used in place of
a simple unit, an exponent may follow on a closing parenthesis which
raises the whole term within the parentheses to the power.
</verse>

And added comment on the removed text.

			 <p>
Up until revision 1.9 there was a third clause 
&ldquo;Since a unit term in parenthesis can be used in place of
a simple unit, an exponent may follow on a closing parenthesis which
raises the whole term within the parentheses to the power.&rdquo;
However this feature was inconsistent with any BNF or other syntax
description ever provided, was never used and seems to have no 
relevant use case. For this reason this clause has been stricken.
This is a <emph>tentative</emph> change. Users who have used this 
feature in the past, should please comment on this deprecation. 
If we receive indication that this feature was used by anyone, we
would undo the deprecation. If no comments are received, the 
deprecation continues to take effect.
			 </p>

Strike the entire caption detail under the BNF as it is much out of date and only introduces more confusion.

The term vs. component associativity changes as indicated above.

comment:7 Changed 7 years ago by Gunther Schadow

Resolution: fixed
Status: assignedclosed

Done: [16412]

Note: See TracTickets for help on using tickets.