File access

From MZXWiki
Revision as of 05:18, 29 May 2008 by Wervyn (talk | contribs) (Finished basic file i/o section.)
Jump to navigation Jump to search

MegaZeux has supported reading and writing data to files since version 2.60, and since then this capability has been augmented and improved to allow for the use of external resources in a number of different ways. Apart from counter interfaces to directly manipulate the contents of a file, file access encompasses specific interfaces for reading and writing robots and Robotic, MZMs, and entire MZX worlds.

The Basics

MZX supports basic file reading and writing through the use of specific keyword counters, which fall into two obvious categories: "fread" counters for reading, and "fwrite" counters for writing.

First, a file must be opened for reading/writing, using the following syntax:

set "filename.ext" to "fread_open"
set "filename.ext" to "fwrite_open"

Directory paths can also be used (MZX is OS agnostic and interprets both '\\' and '/' as separators), but for security reasons MZX will not allow access to directories below the one the .mzx file is running from. It is also perfectly acceptable to have a file open for both reading and writing at the same time. However, "fwrite_open" clears the contents of whatever file it opens and starts editing a new, blank file. So MZX also provides the "fwrite_modify" and "fwrite_append" functions, which preserve the contents of the file being opened. "fwrite_modify" places the write cursor at the beginning of the file, while "fwrite_append" starts writing at the end of the file.

The cursor, or in other words the place in the file currently be read from or written to, is maintained in the counters "fread_pos" and "fwrite_pos". These are separate values that do not interfere with one another, since it is after all quite common to have one file open for reading and another for writing. The cursors are automatically advanced when read/write operations are performed on the file, but they can also be set manually to navigate around the file. Before the port, when counters were only 16-bits, there existed "fread_page" and "fwrite_page" counters for working with files larger than 64KB in size. Since counters became 32-bit, however, these have been rendered obsolete, and fread_pos/fwrite_pos rendered absolute. Additionally, this has allowed for a convenient way to seek to the end of the file, by setting the appropriate "_pos" counter to -1.

Reading and writing actual data can be done in a few different ways. The simplest of these is simply to use the counters "fread" and "fwrite" as placeholders for data to be read from or written to the open file. "fwrite" must always be assigned TO (except for certain syntax related to strings, see below), but "fread" can be interpolated into robotic code just like any other counter, and will be evaluated each time it is encountered. Evaluation, in this case, entails reading one byte from the file, and advancing fread_pos by one. So for example:

set "counter" to "('fread'+('fread'<<8)+('fread'<<16)+('fread'<<24))"

This would read four bytes stored in little-endian format from the file, and combine them together to form a single 32-bit value. (See the articles on expressions and bitwise math for more information on this code.)

This is a somewhat contrived application, since another way to read from and write to a file is to use "fread_counter" and "fwrite_counter". These were originally intended to conveniently read and write full counter values, instead of just bytes. Currently they are out of date, and work on 16-bit values instead of the 32-bit values MZX uses now. However, there are plans to update them to work on 32-bit values in the near future.

Finally, MZX provides reading and writing of arbitrary length strings. Originally, the method for doing this was very limited. The "fread" and "fwrite" counters were overloaded to work with string values. A quirk with the way strings are handled forced a nonstandard syntax here, however: instead of

set "fwrite" to "$string"

it was (and is) necessary to use

set "$string" to "fwrite"

This method is limited in its usefulness because it requires the use of a terminating character, which is not user-definable. Instead of choosing something sensible and widely used, like a null byte (i.e. 0x00), Exophase chose to make the terminator an asterisk (*). As a result, the "fread#" and "fwrite#" function counters have become preferred for their accuracy and lack of side-effects. The number provided to the counter indicates how many characters to read or write. So for example, "fread20" reads 20 characters into a string. Be aware that these counters are specifically overloaded for string manipulation, and so it is not possible to interpolate "fread20" into an arbitrary string, as might be done with "fread". The syntax is very strictly:

set "$string" to "fread20"
set "$string" to "fwrite&$string.length&"

The second line demonstrates the use of counter interpolation to simply write the string in its entirety. It can of course be used to write only part of the string as well, and with string offsets to write an internal substring (see the article on strings for more details on this sort of manipulation).

Of course, being able to read in a string up to a terminating character is a useful function that is currently non-trivial (though still fairly simple) to implement. One way to do this would be to incrementally build the string byte by byte. Alternatively, one can seek ahead in the file for the terminating character, and count the number of character read from the starting point, reading the whole string in one command. The first method parses the file only once, but has to dynamically resize the string many times. The second method requires two passes over the portion of the file containing the string, but allocates space for the entire string only once. Since file operations are less costly than memory reallocations [A.N. citation/testing needed!], the second method is preferred, and is detailed in the following code:

set "local1" to 0
. "i.e. a null terminator"
set "$tmp" to "target_string" 
. "This sets up a string name to be passed by reference.  This is almost entirely a style thing in this case."
. "The only real benefit is that for a long string, there's less memory overhead since the string only has to be allocated once."
. "Still, it's a neat piece of code."
goto "#readstr"
...
: "#readstr"
set "local2" to "fread_pos"
: "readstr_loop"
set "local3" to "fread"
if "(('local3'='local1')o('local3'=-1))" = 0 then "readstr_loop"
. "This checks for the end of the file as well, since otherwise we could loop forever without an appropriate terminator."
set "local3" to "('fread_pos'-'local2'-1)"
. "The -1 accounts for the terminator itself, so that it won't be included in the string.  This can of course be changed."
set "fread_pos" to "local2"
set "$&$tmp&" to "fread&local3&"
. "This is the pass-by-reference trick.  The same thing can be accomplished by setting the target string to $tmp after the"
. "subroutine returns.  As mentioned before, this does cause the string data to be allocated twice."
set "local2" to "fread"
. "We need to read one more character to get past the terminator; this can be used as an extra return value to test whether"
. "the string terminated as expected, or at the end of the file."
goto "#return"

Of course depending on the size of the data stored in the file, another popular method is to simply read the entire file into a string and parse the contents internally as needed.

One final note: once you have finished working with a file, it is generally a good idea to close it and free up the handle for other programs to use. However, MZX does not provide an fread or fwrite_close function. Instead, and because only one file can be open for reading or writing at a time, the following trick is used to reset the handle:

set "" to "fread_open"
set "" to "fwrite_open"

Hackish though it may be, this closes the respective file by attempting to open a non-existent, empty string.