Technical Specification for Format Specification for fprintf, sprintf, sscanf, etc.

The documentation provides a good overview for someone looking to understand how to use format specifications for reading and writing formatted strings. However, there are details and nuances that are not made clear at all in the documentation. Rather than perform an parametric exploration of valid format specs. I would prefer a tech spec. As an example of things I could not infer the validity of given the online documenation.
How many flags may appear together? I had thought only one and then decided to explore:
Here are some that I tried:
>> parametricFmt
Command: aString = sprintf("%#5.0f", 25);
<edge> 25.</edge>
Command: aString = sprintf("%#-5.0f", 25);
<edge>25. </edge>
Command: aString = sprintf("%#+5.0f", 25);
<edge> +25.</edge>
Command: aString = sprintf("%#+05.0f", 25);
<edge>+025.</edge>
Command: aString = sprintf("%#+005.0f", 25);
<edge></edge>
Command: aString = sprintf("%#+05.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+5.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%+5.0x", 25);
<edge> 19</edge>
Command: aString = sprintf("%-5.0x", 25);
<edge>19 </edge>
Command: aString = sprintf("%#-5.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#-05.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#- 05.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#+ 05.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+ 015.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+ 15.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+015.0x", 25);
<edge> 0x19</edge>
Command: aString = sprintf("%#+-015.0x", 25);
<edge>0x19 </edge>
Command: aString = sprintf("%#+#-015.0x", 25);
<edge></edge>
Any guidance would be appreciated.
Kind regards,
Will

Respuestas (1)

dpb
dpb el 3 de Sept. de 2024
Editada: dpb el 3 de Sept. de 2024
Mathworks does not publish their internal specifications other than the documentation. One can submit service requests for clarification and/or bug reports and sometimes documentation will be clarified/expanded as a result.
As noted in the fprintf doc References section, Matlab formatted i/o is based on C standard library functions printf and scanf; the particular references still refer to the old K&R C and the 1989 ANSI C. Matlab is compiled with modern compilers which will be based on recent C/C++ standards, but since it is not directly the C i/o standard library functions that are being used by Matlab but an internal version that has been modified/vectorized to handle array inputs that the C stdio library cannot, updates to recent standards are not included and the pertinent documentation for format operators would still be the older references.
The <documentation> mentions that more than one can be used at a time although it is only a note in a top level section on formatting and is not repeated in the specific formatting specifier section linked to from the function descriptions. That is surely an oversight and should be amended to do so. If one already knows something about C, one would already be aware of that; if one doesn't have that prior knowledge, I agree it isn't made obvious without a fair amount of digging to find that out.
AFAICT, in the C language documetation outside MATLAB it is also only explicitly specified that zero or more of the flags can be used; I believe the implication is that one would not duplicate the same flag multiple times but I do not find that explicitly stated.
<A Linux printf man page> states "The character % is followed by zero or more of the following flags:" which is reworded in the MATLAB documentation as "You can specify more than one flag in a formatting operator".
The "go to" reference is <P J Plauger>, but I don't have a copy at hand to see if he mentions in the text a limitation.
My personal interpretation/opinion is that more than one of the same flag character is undefined and the compiler can do anything, but I am not a member of the Standard committee so my opinions are only that. :) There was a <Stack Overflow thread> on the subject a number of years ago with varying opinions and some example behavior of some compilers; some warned, others didn't.
At the time referenced in MATLAB documentation, the standard library was not part of the C Standard, and so wasn't covered explicitly; undoubtedly The Mathworks is still maintaining/upgrading the same base code they started with back then with whatever behavior it takes, but I don't believe it is more explcitly documented anywhere, other than what Plauger may say--but TMW won't be using his implementation, regardless, so that would only be what that and later versions may be required to do, not what the MATLAB implementations do.

4 comentarios

This is a great summary. Thank you so much for putting a lot of detail into your investigation into the expected behavior. The motivation for me to look into this was that I had intended to write my own regexp toolkit to validate valid format Specs and the first thing that happened was I noticed the divide between Matlab's proprietary fprintf and the open standard. Immediately, I wanted to find a more authoritative document than the online help. That kind of documentation is often geared towards onboarding type of literature. Very, very useful, but not helpful when one wants to assess if something is right or wrong. :-)
Separately, but very much related for me, the AI revolution has increased the popularity of regular expressions and tokenization, but I have also noticed that a lot of software is applying pattern recognitions that are not robust even in commercial grade software. I have a long history with Regular Expressions. Instead of advancing, we as a community might be losing some of our chops when it comes to writing robust patterns. Also, I have been vetting the Regexp Toolbox and want to feel confident with it.
As a compliment, the treatment of strings vs chars has become much, much better and I am really happy about that. I've always thought that only computer scientists think of strings as arrays of chars. Everyone else thinks of them as atomic things., so having that formalized is pretty cool.
Thanks for looking into it. I'll keep my eye out for more documentation. If you need me to file anything anywhere, I would be happy to. Until then, have a good one.
Will
This is a bit of an aside but since you mentioned regular expressions, in release R2020b we introduced pattern objects that you can use with many of the text searching functions in MATLAB. This documentation page may be of interest to you.
It really wasn't a lot of time/research; I was already aware of the content, I just wrote it down and looked up a couple of links. The only thing new I hadn't seen before was the Stack Overflow thread.
With Mathworks, since MATLAB is a proprietary product, the user documentation is all that is ever going to get published; it's not their job to make the language definition available publicly although I have been known to also complain that it is such that the doc is not a definition.
As for regexp, anybody who is a guru there is a magician in my view; I can't even make simple expressions work, what more anything robust!
Earl DeShazer
Earl DeShazer el 9 de Sept. de 2024
Movida: Voss el 10 de Sept. de 2024
@Steven Lord Thank you for the reference. I really have enjoyed the regular expression tools. They are rich. As with everyone's implementation's of Regex, there are differences that require poking around, and there have been some differences from Perl that I had to work through, but overall this is a job well done. Also, I want to say, I never realized how powerful char arrays were unitl this recent go around. One doesn't need (?x) because on can just string some chars together. I really enjoy that.
@dpb I realize now your perspective on Matlab's willingness to share a spec and that you are like me a user of Matlab and not a developer. That said, I know a few people over there and they are generally quite reasonable. IMHO, a spec is not an implementation document, it is a contract with the customer or between team members on what something should do or how it should behave. Without clear specifications, one (a customer) is at risk using the software. So hopefully, more detailed info will be shared. Also, I want it written that what I mean as spec and what someone else means as spec may be different. To me, implementation is proprietary; behavior, on the other hand, is the face one has to the public. How could that be propietary when one could just exercise it and see what it does. The real question is how much money does it take to provide that level of support. However, probably less than getting stupid questions. :-)
Thank you both for your responses.
Cheers

Iniciar sesión para comentar.

Categorías

Más información sobre Entering Commands en Centro de ayuda y File Exchange.

Productos

Versión

R2024a

Preguntada:

el 3 de Sept. de 2024

Movida:

el 10 de Sept. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by