Written by:
With contributions by:
The home site of the project is at http://www.opensg.org, the source
management is done by SourceForge. It is written in
simplistic HTML with lots of blank lines to allow the use of CVS for
management, diff to find changes made by new contributors and to
automatically reformat it to fit an 80 columns editor. Please keep it
that way. If you want to add images, please send them to the
maintainer. Thanks.
This document is very much work is progress, so everything written is
subject to change. Furthermore it represents the goal of the system,
not the current status. Comments are in italic, sometimes in the
form of a list of random thoughts. You're welcome to add more, if you
can explain them to me.
Over time lots of scenegraphs have been written. They all have different advantages and disadvantages. The two most famous ones are OpenInventor[1] and Performer[2]. The first is very flexible and object-oriented, but has no multi-processing support and the design flexibility impacts its rendering performance. It has been developed on IRIX and been ported to other Unices and Windows. The second is strictly performance-oriented with a focus on vis-sim applications. It has an APP-CULL-DRAW-oriented multiprocessing model and is only available on IRIX and Linux.
Since 1995 SGI has been trying to built a new unified scenegraph that combines flexibility and performance. First as the Cosmo project, later as the OpenGL++ proposal and finally in the Fahrenheit project. All of them were canceled at one point in time, and at SIGGRAPH 99 SGI publicly stated that they were not trying to build another "people's scenegraph".
At SIGGRAPH 99 Kent Watsen (kent@watsen.net, NPS), Alan Bierbaum (allenb@iastate.edu, VRAC University of Iowa) and DR got together to look for alternative scenegraphs, but couldn't find one that fit all our needs. Thus this project was started.
It should be able to drive a number of display systems, like multi-screen projection displays (Powerwalls, CAVEs) etc. thus it needs to be able to handle multiple coherent views into a single scene.
It is not a Virtual Reality system, it's just a rather application independent scenegraph. Thus support for routes a la VRML is beyond the scope of this project. Handling of interaction devices like trackers etc. are also beyond the scope of this project. There are already a number of toolkits that can handle these problems.
It is not seen as a necessity to support other low-level APIs. The importance
of specialized APIs like Glide is diminishing to zero with the advent of
hardware T&L in the low-end market. One major goal is portability, and OpenGL
support is becoming more common in PC graphics accelerators (thanks to John
Carmack and the Quake family of games), so that D3D support is not a killer
feature for a graphics system any more.
One problem with multi-API systems is that it either leads to a lowest common
denominator approach just supporting the cross-section of the features, or to a
split feature set where a number of features are not portable. Neither is an
appealing situation.
For OpenSG the axioms are:
Class names should be nouns. Basic classes should use simple nouns, derived
classes the name of their base class plus their own name.
Examples: class OSGLight; class OSGDirectedLight;
Since classes should start with upperclass letters we use class
OSGLight; instead of class osgLight;.
Methods should use the <verb>[<adjective>]<noun> convention.
Examples: OSGLight::getColor(); OSGMaterial::getSpecularColor();
To simplify the document the prefix is not used here. It is implicitly set in front of every symbol used by the system.
Only a limited subset of the polish notation is used. Enumeration types use an appended E to designate them, pointers use P. To simplify use of reference-counted object we could also use a smart pointer type that is used for all pointers inside the system, which uses the SP suffix. Should we do that? As we have a special type for field container pointers anyway it wouldn't make much of a difference for using it. But the cost of mandative refcounting might be too high, so enforcing it wouldn't be a good idea. So I'd rather leave it for now. (DR)
At runtime the PrintVersion( ostream & stream ); function can be used to output the version of the library for reference purposes. It can also be queried using GetMajorVersion() and GetMinorVersion().
The library offers services for thread creation and basic threading tools
(semaphores, locks, barriers, process priority adjustment, processor locking
and process assignment). Ok, so some of them are not so basic, but should be
possible on all the target OSes anyway. Since we rely heavily on a fast
implementation of the multiprocessing stuff we will provide all the nescessary
functionallity. Additional functions like message queues which are not needed
for the core system have to be discussed.
unsigned byte | UInt8 |
short | Int16 |
unsigned short | UInt16 |
int | Int32 |
unsigned int | UInt32 |
long | Int64 |
unsigned long | UInt64 |
float | Real32 |
double | Real64 |
float[3] | Vector3f |
float[3] | Point3f |
float[4][4] | Matrix4f |
float[4] | Quaternion |
node pointer | NodeP |
The first simple types are just to guarantee a fixed size.
For type-saftey there are different types for vectors and points. The
additional program overhead is small due to C++, and the difference enforces
a clear idea about what the data is used for.
The array types support all the usual conversion and mathematics functions.
These are only some of all the possible functions. These things tend to get
big...(DR)
Vectors can be added, subtracted, multiplied (scalar, cross, dot), incremented,
decremented and multiplied with matrices/transposed matrices/partial
matrices.
Matrices store their type (identity,translation,orthogonal,general) and the
sign of the determinant in addition to the basic data to facilitate efficient
vector-matrix multiplications. They themselves can be scaled, translated,
rotated, multiplied, determined etc.
Quaternions can be created from matrices and axis/angle tuples, they can be transformed into matrices, multiplied, normalized and interpolated (slerped).
It manages a bitmask using an alloc/free paradigm. Several bitmask manager instances are provided by the system for the different masks.
Bitmasks are 32 bit wide, for efficiency.
How big should they be? 32 might not be enough. How well are long longs handled on Intel processors? Should they use an arbitrary size? I'd like them to be constant size, as it makes life so much easier. Even at the danger of sounding like Bill Gates, 32 bit should be enough for everybody. And if not C++ encapsulation allows us to change it later.
The volumes support creation around a set of vectors (for initial construction around geometry) and around a group of themselves (for hierarchisation). They can also extend themselves by a volume of the same type for incremental building. Conversion between different types of volumes is done by having a constructor taking an axis-aligned box as the parameter and featuring a cast operator to axis-aligned box. Is it too dangerous performance-wise to allow that? Maybe we could disable that cast for profiling?(DR) For some useful special cases direct casts are implemented. Volumes can be transformed by a matrix to allow transformations. Only conversion from and to axis-aligned boxes(AR)? For simplicity reasons, yes. Converting from everything to everything explodes fast. If you need special cases you're free to write them, I just don't want people to expect conversions to be free or fast. (DR)
To use them they can be intersected by rays, returning minimum and maximum intersection distance (if any). As all volumes are convex this is complete. One of the main uses is visibility determination, so a conservative intersection test against frustra is implemented, too. As a simple basis for that a point in volume test is realized.
What's the most useful representation for rays? Point + vector, point + normalized vector + length, 2 points? I'd go for point + normalized vector + length, any problems with that?(DR) I agree.(AR)
Volumes can have special states: empty, infinite, invalid and frozen.
Newly created volumes are empty. Extending a volume by an empty volume doesn't
change it, intersecting an empty volume never hits, transforming an empty
volume leaves an empty volume. The emptiness test is probably used very often,
thus it should be fast.
Infinite volumes are similar to empty volumes. Extending a volume by an
infinite volume doesn't change it, intersecting an infinite volume always hits,
transforming an infinite volume leaves an infinite volume.
Volumes are invalidated when their enclosed geometry is changed, but the volume
has not been recalculated yet. This is used for lazy hierarchical bounding
volume update. Invalid volumes are made valid by assigning a new value to them,
all other operations leave them unchanged.
Frozen volumes are used to enclose objects that change frequently but don't
leave a bounded area. Frozen volumes ignore invalidation requests and return
failure to these requests, thus hierarchical invalidation should stop at frozen
volumes. They can be intersected and transformed like normal volumes.
Error handling? Exceptions? Return codes? Some tests show that exceptions have no performance penalty anymore (at least with sgi 7.3 compilers), so they seem ot be the first choice for error handling right now.
optionally frame locked (for slower than life animations and recording), connected to scenegraph?
The current implementation allows mixing interfaces and adapting them (e.g. functors that call simple functions have an additional void * userdata parameter, or a functor can call a method or a simple function), but at the cost of a virtual function call. I'm not sure if that flexibility is really needed. Maybe we should built a functor with a single interface that doesn't need the virtual call. (DR)
The actual values used are taken from the global default object Defaults. These values can be changed by the application, and will usually be changed by a system-specific module to reflect the most appropriate settings for the actual system.
This functionality is mainly used by the materials, to allow system-specific selection of optimal parameters.
This is not enough for all platform-specific optimizations. Some things have to take more care and are hard to do on the fly, or change the data, like reformatting image or geometry data to the optimal format. These can't be done automatically, as the application might depend on some data formats, even at the cost of performance. So for these an opimization action is provided, which optimizes the tree and the data.
The clean multiprocessing concept is one of the main new capabilities of the OpenSG. There have been attempts of doing this before.
Performer uses a separate scenegraph for every process that is used. Changes
between processes are communicated via a changelist that collects the ids of
all nodes that have been changed since the last sync.
Our own scenegraph Y uses a more specialized concept by only having some
fields that are needed by the downstream processes replicated. The data is
copied between the fields at sync time depending on a dirty flag. This
necessitates a traversal to catch all the changed nodes, which might be
expensive for large graphs.
The MP model for OpenSG is more general and flexible. It uses a mixture of both concepts, based on a completely shared process model like pthreads. Nonetheless there is a compile time option to switch the multithread support off. This can be used for applications that don't want or need MP support, and equally important for benchmarking the overhead of the multithread support.
Every field has a number of aspects. The maximum number of aspects is
determined at initialization time. Doing it at compile time would make
things a bit easier and probably more effective due to being able to used fixed
size arrays. If the upper bound of the number of aspects can be sensibly
defined. Which it probably can't. :( It might be an interesting experiment to
try both and see how they work. As the implementation is hidden inside the base
field container out of sight of the applications that's what we're going to do.(DR) With
the current model of replicating field containers it's not as bad, init-time
should be ok.(DR) Every thread is bound to one aspect. When accessing the
field data the version that is assigned to the currently active aspect is
returned/changed.
The standard case will be a different aspect for every thread. That is not
enforced, though, so a careful application can have multiple threads working on
a single aspect. This is possible because the ChangeList is separate for every
thread, so after the task is done they can be joined. It has to take care that
only one thread gets to write a certain field, though, or to join the
ChangeLists in the right order, whichever that may be. This allows multiple
processors working on a task without using up lots of aspects and having to
synchronize all of them to get the result together. A typical case would be
calculating vertex normals or striping a whole tree, which can be distributed
among a set of worker threads, probably using a work queue for a
producer-consumer scheme.
But this general method means that every field access needs to identify the
current thread's active aspect. It would be possible to leave the responsibility
for carrying this current aspect id around on the shoulders of the application.
But this makes applications a lot more complicated to write and furthermore it's
prone to errors that are very hard to identify.
Thus the system needs an extremely efficient way of finding the current aspect
id. A system call would probably be too expensive, a thread-specific variable
would be better. As the aspect id of a thread will change very rarely, it should
be possible to force an optimizing compiler to keep the value in a processor
register, giving very low overhead compared to a simple single value scheme.
Irix and most other Unices should supply thread-local data as a part
of the thread descriptor. Irix does it, for others we looking for it
but are optimistic about it. Linux doesn't, though, and we hope that
will change in the future. Right now there is a solution for Linux that
works, but not at full performance. Apparently Linus is objecting to a
more efficient approach, maybe that will change in the future. NT has a
fast method for access to a low-integer thread identifier too, so the
important bases are covered. (DR)
An open problem is the memory organization of the multi-buffered fields.
The simple way is to put a copy of the data in the field. But that makes the
field rather big, thus more cache-unfriendly, and gives rise to severe
cache-consistency work. But it makes field handling very easy, as pointers to
fields are valid in every thread.
The alternative is embedding the fields in a larger structure and replicating
these larger structures. The fields used by nodes are embedded into fiesld
containers anyway (see field container). The
problem is the inability to easily use a pointer to the field any more. This
can be overcome by defining a FieldPtr class that wraps the
container/field pair to access fields. The most common access to a container's
fields should be via the container's access methods, so the additional overhead
for keeping the container around field pointers should easily be compensated by
the generally more efficient field data access, thus this alternative is the
preferred one right now.
The consequence is that there can be no naked fields, every field will have to
be part of a field container.
Every field is uniquely identified. This could be done by the pointer or by a unique integer. Given the increasing importance of distribution for clustering and collaborative work they will have a unique numeric id even though the ChangeLists will probably use pointers.(DR) Every thread keeps a list of the ids of recently changed fields. When a field is changed it's id is appended to that list. These change lists are managed in the ChangeList class.
ChangeLists keep a list of the fields that have been changed in the active thread since the last synchronization. Every field change is automatically appended to the active ChangeList. But every field should only be in the list once, even if it was changed multiple times, so the data is only copied once. Thus the change list needs a fast way to reject already entered ids, or unify them quickly before synchronizing. There are a number of ways to do that. The easiest is keeping a changed bit in the fields that is set at change time and cleared at sync. Keeping the change list sorted is another way, or sorting it at sync time and eliminating doubles. Keeping the change list as a bitfield indicating changed fields is not going to be efficient, as the expectation is that only a small percentage of all fields are going to be changed between syncs. I don't really like the bit per field idea, as keeping a single bit per field per aspect is not going to be efficient, as those bits will increase the field size, probably by a full word. Gut-feeling says it's going to be the most efficient way, though, but sorting at the end might not be all that bad. It heavily depends on the length of the change lists in practical applications. Statistics and benchmarking will hopefully answer that. A number of alternatives could be implemented as subclasses of ChangeList and either selected by the application or maybe even exchanged at runtime depending on the size and frequency of changes.(DR)
CLs need to support appending field IDs, obviously, and synchronizing an aspect to another aspect and his changelist. This also has to work both ways, so that two threads can sync each other without creating a change storm, where changes are shuttled back and forth between threads, creating all-encompassing CLs. But changelists also need to be joined without any synchronization. The need for this arises from the above mentioned splitting of a task across multiple threads working on one aspect. But at least equally important is the keeping of several CLs by a master process to keep multiple independently running asynchronous threads consistent. It does that by keeping a CL for every worker thread. When a worker asks for a sync the master's CL is appended to all workers' CLs. Then the worker's private CL is appended to all other workers CL, after which the master's aspect is synced with the worker using the storm-free sync mentioned above. This allows multiple independent asynchronous threads working at their own speed.
This change list concept works rather efficient if every thread uses the same scenegraph. But as soon as there are threads for specialized tasks that are only interested in a small part of the scenegraph they will have to handle a whole lot of change messages they are not interested in. To facilitate filtering a thread can attach a discrimination functor to his aspect. This functor is invoked when a change is to be integrated or synchronized with the aspect, and can reject them if the thread is not interested. This decision can be based on whatever algorithm is implemented by the user. Should we make this easier? Here the unique integer field id might be handy as an index into a bitfield. Another idea would be giving the fields types that can be discriminated against. Which types is a very open question... Alternatively maybe an interest mask similar to the node's traversal mask could work. Using a general allocation/deallocation handling for the bits would allow a pretty efficient discrimination. That crashes hard when the available bits are exhausted, though.(DR)
What about producer-consumer changelists that allow overlapped synchronization (see Performer's CULLoDRAW)? Is that something special or is it just frequent synchronization? I think it's the latter, so a special case for that is not needed. (DR)
Besides changing, fields can also be created and deleted, both of which is handled by the change lists, too. As there are no naked fields, this is handled by the field container objects. Field containers can only really be deleted after all aspects that use them have stopped using them. This is accomplished by incrementing their reference count in a new aspect on synchronizing with the change list of the creating thread. On accepting a deletion change the reference count is decremented and the field is deleted when it reaches zero.
Multi-buffered fields and ChangeLists are the most important data structures, as they are used by every access to the data. We should take extreme care to get them right. As that is next to impossible to do without writing real applications using them we will take care to leave as many alternatives open as we can without compromising efficiency.(DR)
So far the stored data has been treated very abstractly, or rather, not at all. There are three variants of the abstract Field that store the data differently.
The simplest is the SingleField<datatype>. It just stores the real
data of the value to be stored in the field. This is usefully applicable to
small, fixed size data like integers or real values. At sync time the data is
just copied between aspects, so the data has to be small, and as it is
integrated into the structure by the virtue of templates, it has to be fixed
size. As the real data is copied, this type of field can only be used for
pointers with care, as the pointed to data is not copied.
Access is done using the <datatype> getValue(); and
setValue(<datatype> val); methods.
For all the basic types there are corresponding field types.
For pointed to data in the form of variable sized arrays there is the
MultiField<datatype>. Reading them is done by
<datatype> getValue(uint32 index);. Changing them is a bit
more complicated.
To prevent useless data copies the array fields employ a copy on write
strategy. Making that efficient without having to check the validity of
the current array and appending to the active change list for every
write access is most easily done by bracketing the write accesses.
Before writing the data beginEdit(hints) has to be called. At that
point a private copy of the array data is made, if one is not associated with
the current thread already. Sometimes it's not needed to actually copy the data,
as all of it is going to be recalculated anyway. Thus by giving the right
hints to beginEdit() a copy is not made, just uninitialized
new memory is allocated. To find out if a new copy is needed the arrays are
reference-counted, so if only one user is attached to an array, no copy needs
to be made. Should there be hints that indicate that only some elements
changed? This would make sense if the buffer to be synchronized with has pretty
much the same data already, so that only some elements have to be copied. The
only problem is how to know that? I'm tending to say it's the applications
responsibility to create separate fields/objects for data that changes often
and data that doesn't change. Might be a good idea anyway as static data could
be optimized more aggressively (OpenGL display lists etc.).
At endEdit() time the field is appended to the change list. This
prevents synchronizing with partially changed and inconsistent fields, even
though it should be considered a bug to synchronize before closing the open
fields. What about nested open/closes? We could allow it by keeping a
counter for the number of opens/closes, which would also allow tracking
unclosed fields. Right now I think that's not neccessary, but it might be an
option in the future. (DR)
One important aspect is locking. MP-hard reference counters need to be
locked. The system-dependent utility library should provide basic locks, but
how many? One for all fields? Will become a big bottleneck. One for every
field? Too expensive, SGI's have a limited number of hardware locks (do they
still? I know it was 4096 some time ago). Locking might be useful for general
structures, so I'd propose creating a general locking facility based on the
pointers of the object to lock.
Some address bits should be used to index into an array of locks. If the locks
become a bottleneck it can just be increased in size and the load in
distributed over more locks. Probably the bits 0xf80 are the best, as
they change hopefully a lot in indeterministic ways, but are not influenced by
double-aligned structure placement, and 32 locks sounds like a nice compromise
to start with. As in most other cases, there should be a way to gather
statistical information about usage and contention, so the number can be
optimized.(DR)
Element management is done using STL vectors. The STL vector interface is exposed in large parts to make using them easier. Accessing data is done by getValue(int index) and setValue(int index, value). Both do not check the index for efficiency reasons. There should be a debugging version of the library that does, though. (DR) If needed the pointer to the data can be accessed using getValues(), this is not recommended though, as writing over bounds can crash the system.
Should we keep the arrays that have a reference count of zero as spares to be used when a new array is needed? This will prevent frequent allocation of array memory when a new array is needed. This can rather easily be done by letting the field containers create a spare aspect that keeps the currently unused arrays around. As the typical case is change in one thread, synchronize with another thread (freeing one copy), and then changing again in the first thread (needing a new copy) just keeping one spare around will handle the most common cases. (DR)
IrisGL/OpenGL introduced the idea that the rendering system doesn't have to own the data. To integrate fast rendering into another, possibly existing system it shouldn't. To allow that there is a MultiField variant that does not allocate the data itself, but rather uses user-supplied functors to do that. These functors can return pointers into application data and thus prevent data duplication.
flux idea: multiple initialized buffers, changing only some parts of
attached to frame number? Seem like an application, not a core feature right
now.
Field Container
To allow more efficient aspect management fields have to be enclosd by
field containers. The different aspects are not kept in the fields, but
rather in the field containers. This keeps the data for the different
fields of an aspect close together, alleviating cache problems. The
different copies of the field container are kept in a contiguous block.
Pointers to a field container point to the beginning of that block and
contain the size of a single aspect, so it's easy to get to the
relevant data without dereferencing the basis first.
But as a consequence, standard C/C++ pointers don't work anymore, as they have to be manipulated to point to the copy used by the aspect. Actually, that's not quite true. C/C++ pointers can be used and are valid, as long as they are only used in one aspect. If the application can guarantee that the aspect is not going to be changed, it can work with standard pointers. Just don't come running if your application breaks in unpredictable ways... ;)
Field containers are the lowest level sharing group. Single fields cannot be shared between structures, i.e. pointers to fields are not used to reference data from multiple positions. The problem with sharing is the need to inform all users of changes. To do that the sharable unit needs a list of users. Having that for every field is too much overhead, IMHO, as most fields will not be shared. (DR) Thus sharing can only be done on the field container level.
The container classes store information about the field's names and
types in a type class. These field descriptions can be used to access
the fields of a container by names. To do that they keep an access
method for the field. These field descriptions can be extended by an
application to additional data about a field (e.g. whether a field
should be used for reading or writing).
The field containers themselves are typed. These types are organized
hierarchically to allow creation of new container types based on
existing types. Only single inheritance is allowed. This type
information is also used by the traversal mechanism.
There is a problem with changing field containers. Via the named access, applications can get access to the fields of a field container and can change them. If the field container keeps data that is derived from these fields, this data needs to be invalidated and/or updated when the fields changes. There are two ways to do that: either the field does it, or the user does it. If the field wants to do it, it has to know its field container (every field has one, as there can be no naked fields). But field container pointers are rather big and would double the size of a standard Int32 or Real32 field. Which is a noticable difference for a feature that is probably not going to be used very often. Thus the second way, and the prefered way right now, is to leave the burden on the shoulders of the application. After changing fields of a field container, the application has to call the changed() function of the field container and indicate the fields it changed, allowing the field container to react.
There are several classes derived from field container to serve different needs throughout the system. They are explained later.
To allow that system classes must be able to be replaced at runtime by different, extended classes. This demands a dynamic generative approach, OpenSG uses the Prototype pattern to achieve that.
The higher level system classes have protected constructors and cannot be
instantiated directly. The system keeps a prototype for every class that is
used to create new objects. These prototypes are kept in a prototype manager,
which functions as a factory.
As the prototypes are used as the basis for new objects they should be
initialized with default values and should represent an empty object. The used
attributes will probably be changed directly after creation anyway, so
there's very little point in providing complicated (and expensive to clone)
defaults.
At runtime the application can exchange prototypes and thus change the type of every object created by the system, e.g. by file loaders. The dynamic extension of unsuspecting applications can be realized by having the application load extension modules that create the new prototypes. This should not be done automatically by OpenSG, as some applications might depend on specific features and limitations of the built-in types and would not react gracefully to new objects. A command-line option that can be handled by the system initialize function is probably a useful approach.
Prototypes have some advantages compared to a simple factory. They are objects of their class, after all, so they allow access to static methods like type queries. For overriden prototypes these will return the correct type of new objects that are created, even if the type did not exist at compile time.
To extend objects the decorator pattern is mostly used. It has two major advantages: New decorators can easily be loaded dynamically and added to applications that know nothing about them (by registering a prototype that creates pre-decorated objects), and they can be cascaded, i.e. two decorations that don't necessarily know about each other can be attached to an object sequentially. To allow the concrete decorators to find themselves in the decorator chain, objects that can be decorated have a getComponent() method to access an eventual next decorator in the chain. Not a great name, but the one used in the original text. I'd like getDecorated() better.(DR). For non-decorated objects it returns NULL.
Decorators have disadvantages, though. It is pretty much impossible to decorate an existing object, as all the pointers pointing to the original would have to be redirected. And decorators are not cheap, as they add a level of indirection and force all functions that might be changed to be virtual. So for extensions that are just going to be used in one application deriving from the standard classes would be more efficient.
The performance cost is not nice, but it's only incurred on decorated objects, and other patterns I know are not as flexible and dynamic. Are there other, better patterns for run-time extensions?(DR)
Another function related to object creation is mutation of an existing object to a differently typed one. One example is changing an existing group node to a switch or LOD node. This could be done by adding all the children of the existing node to a new node, removing the old node from its parent and adding the new node to it. The disadvantage of that approach is the invalidation of all pointers to the old node. The backbone/core concept (s. Data reuse) allows a different approach. Only the core container is exchanged, the backbone part is left in the tree. To facilitate that a constructor function that accepts a node to mutate is available. These constructors have to check if the node they are going to mutate is of a type they are derived from or of the same type as themselves, so that access functions used on the old node can still be used on them. In general it a single class derivation should be enough to derive new objects. Having parallel class hierarchies that have to be kept consistent is a heavy burden, that we wish to avoid.
Another aspect is the use of different scenegraphs for different semantics. One scenegraph might have a structure that is optimized for efficient rendering, i.e. use a hierarchy of groups for culling. Another scenegraph might be organized in a more logical way, i.e. group all the elements of one type like all the screws in a car under one parent. A somewhat inverse aspect is having different representations of an object for different tasks. This is handled by the Alternative node.
Classically this was done by allowing multiple group nodes to use a node, or seen the other way round, by allowing a node to have multiple parents. Inventor has no parent concept, Performer explicitly allowed access to the different parents.
This multi-parent model has some disadvantages, though. Very often one needs to know the parent of a node, e.g. to accumulate the transformation on the way to the root node. Inventor was very consequent in this respect: there was no parent pointer, so to identify a node the whole path from the root to the node is needed. Performer has parent pointers, but which one to use? It depends on the scenegraph you're interested in, so you always need to know that, in addition to the node.
Another problem with multiple parents is the inability to store any kind of derived data at the node. For example storing the accumulated matrix to the world is impossible, as there are possibly many ways to get to the root. The same problem impacts the use of bounding box hierarchies for culling and makes the use of names to identify nodes impossible.
Looking at the distribution of the amount of data in typical scenes the biggest part is in the geometric data like vertices, normals, colors and texture coordinates. Thus the biggest savings in terms of memory reuse are achieved by sharing this geometric data. This can most easily be realized by allowing the user to define the arrays of geometric data to store in the Geometry nodes. But this leaves the responsibility for MP safety on the shoulders of the user, so it's not a good solution.
The next step is to put all the geometry's attributes in a separate structure and allow that structure to be used by multiple geometry nodes. This strategy was used pretty successfully by Y.
Applying the same principle to other types of nodes gives other useful results. Switch data for example can be used to switch a number of nodes simultaneously. Thus the principle is applied to all nodes in OpenSG.
The backbone part of a node carries all the data that depends on the position in the scenegraph, like the parent, the children and derived data, and the name and type of the object. This backbone part is the same for all types of nodes, even though the leaf nodes (Geometry) ignore the children. I hate that. But I don't have a real reason for it, it's just my sense of aesthetics... (CK)
All other sharable data is collected in a nodetype-specific container that can be used by multiple backbones.
The big problem: this doesn't allow adding or deleting a child to one
instance so that the change automagically appears in all other instances. How
important is that feature? I can't really think of a killer app right now, but
of several interesting little ones. Memory-wise the backbone should be as
light as possible, so that the overhead of having multiple backbones for a
node doesn't hurt.
We could work around that limitation by having links between the different
nodes representing instances of a node and explicitly mirroring add/delete
actions between them. But that would have to be done for every node
separately, which might add some noticeable overhead to tree changes. I don't
really have a solution for that. I really dislike having multiple parents, but
I can see cases that would benefit. (DR)
When two extensions should be used together, derivation has a problem, as a node can only have one type. Diamond derivation would make the type system very complicated, more complicated than we would like to handle.
In many cases derivation is too big a concept. Many applications would be happy just to add data fields to the nodes. Attachments allow that.
An attachments is special kind of field container (that means it can be shared between nodes) that can be attached to a node. Nodes keep a map mapping a string key and an integer to an attachment pointer. Usually there should be only one attachment of any kind at a node, the integer allows adding a specified number of them, if that's needed.
This also means that the core's fields can not usually be accessed directly from the node. To simplify that and to hide the distinction between node and core a templated typed node pointer is added, that can be cast to the respective core.
OpenSG uses a dynamic visitor pattern similar to the one used by Inventor and Cosmo3D to allow simple adding of new node classes and actions.
The types of the node's core can be used to inquire inheritance relationships between classes, i.e. check if a given node is derived from another type to allow typesafe casts.
The node core's type is used to index a table of functions for every action. The action tables are stored by the different action classes themselves. They are by default initialized from the classes' parent and can be overriden for every action instance, based on the node type to allow easy customization. The action functions are stored as functors.
The action functions for group nodes should not recurse themselves
(i.e. call the traverser on their children). Some actions might use
different action functions, e.g. an intersection optimized to handle
multiple rays would first call a volume-node intersection function and
call the separate ray tests only if it's successful. Other traversers
might want to do a breadth first action to distribute work among
different processors. Group nodes should read and store the information
about which children to traverse in the action instead. To keep the
common case of using all children efficient this is taken to be the
default, if no nodes are specified explicitly.
To do this and still keep generality two functors can be specified for
every type. The pre functor is called on the way down and
should select the nodes to traverse and do the actions needed on them.
The post functor is called when all children are traversed and
should do necessary cleanup.
The signature of the action functors is Action::ResultE traverse( Node * node, Action * trav );.
The function can return the following codes:
Intersections: return the closest/ the furthest/ all intersecting primitives, give the distance to the primitive for cones/boxes and of course the point of intersection/closest distance to primitive, allow a discriminator function to accept/reject intersections
An alternative would be active initialisation, but for that somebody has to know all the init functions to be called. That can become messy, especially if additional components are being linked dynamically. Nonetheless we add that via the field container type, which carries an init function. Generality doesn't hurt. ;) (DR)
Every node has the following general functions:
Why undefined order? Is there a derived node type where the traversal order can be controlled for all children(Switch node?)?(AR) Hmm, I don't really care. We could define the order from left to right (first to last index). The idea is that people using groups shouldn't care. If you care about the drawing order, you probably want a decal. Note that state change minimization will change the rendering order anyway, if not explicitly forbidden.(DR)
Fields: none
Derived from: Node
In many cases billboards are just a single polygon. For these it's not useful to use a transformation to position them, they should rather be done using the BillboardGeometry node.
For intersection traversals the billboard rotates to face the ray. Point billboards just rotate around one axis, as a ray does not have an up direction.
Fields: mode, center, up-axis
Derived from: Group
The simplest is the whichIndex field. It's a simple integer the selects a single child to display. There are some defined values to signal different behavior.
When set to Switch::All all children will be rendered. Similarly, when set to Switch::None no children will be rendered. To select a set of children the value Switch::Some is used. Which children are used is defined by the whichMask field. It's a bitmask, thus the maximum number of children this is applicable for is defined by the bitmask type. Children outside the bitmask range are not traversed. When set to Switch::Single the whichIndex is used to select the child.
This node is not really big, but maybe splitting the functionality into different nodes would result in a cleaner interface.
Fields: which, whichMask
Derived from: Group
This node type is primarily used to keep different versions of its children to make different kinds of traversals more efficient. A typical use would be to have a low res version for collision and maybe intersection, a high-res version for reference and data probing and a rending-optimized version for rendering.
Fields: traversalTypes
Derived from: Group
Very different LOD selection criteria are possible. The initially implemented one is simply using the distance between the reference point and the viewer to select one of the children. The childDistance field contains the distances at which the respective child becomes active, it should be sorted in ascending order and the children should ordered beginning with the most detailed. The field can contain one more entry than there are children. The first defines the distance the first child starts to be active, the last one the distance the last child becomes inactive.
Selection criteria that take global stress into account or not only switch between children but blend or morph between them may be implemented later. One version will take an error in pixel and a distance in world space per level and select the level for which the distance when projected to the screen will be smaller than the pixel error. This mode can especially be used for displaying tessellated free form surfaces.
quality/value? priority/importance -> global selection; accumulative lod
For intersection traversals the highest quality child is used. Should that be selectable?(DR)
Fields: mode, reference point, child distances
Derived from: Group
On the one hand lights only influence their children. Thus it is possible to have lights that only light a part of the scene. But they also need to have a position and orientation in space. This could also be defined by their position in the scenegraph, but that prevents for example attaching a lightsource to a moving object and having it light the rest of the scene.
Cameras face a somewhat similar problem. They are not part of the scenegraph, but being able to manipulate their position and orientation in way similar to the one used to manipulate objects is useful, e.g. to attach them to moving objects.
Thus lights keep a pointer to a node that defines the coordinate system used by the light. The light's position in the scenegraph defines which parts of the scene are lit.
One problem with this is the need to have a bunch of lights below each other on the top of the scenegraph. I don't see that as a big problem. For high speed rendering more than one or two lights can't be used anyway, and even if a bunch are used the nodes are traversed pretty fast. I like that better than having a new field type Light that has to be derived for all the derived light types. (DR)
explicit light influence areas? activation/selection in material? prelighting? multipass lights?
Fields: reference node, diffuseColor, specularColor, ambientColor, constantAttenuation, linearAttenuation, quadraticAttenuation
Derived from: Group
Fields:
Derived from: Light
Fields:
Derived from: Light
Fields: angle, exponent
Derived from: Light
in tree at all? would be nice to have somewhat localized fog and ambient light, but how to make it efficient? Real localized fog is a completely different story,
Fields: ambientColor, fogType, fogStart, fogEnd, fogExp
Derived from: Node
Fields: transformation
Derived Fields: toWorld transformation
Derived from: Group
The separate transformations can be one of a set of primitive transformations: Translation, Rotation, Scaling, Matrix, Orientation (from/at/up), Tracer (orients towards a beacon), or Beacon (directly uses the beacon's transformation).
This gives rise to a field type proliferation that I'm not too happy about. But they're just too useful to be ignored.(DR)
Fields: transformations
Derived from: Transformation
Fonts are defined in a separate class, so they can be reused by many text nodes.
Caching? Dlists or just storing the created geo? Automatic?
Fields: font, text
Derived from: Node
Fields:
Derived from: Text
Fields:
Derived from: Text
Fields:
Derived from: Text
A geometry node's data faces a number of challenges. It is very big,
often dominating the memory use of the whole system. Thus it should be
shared between nodes if possible. But usually not all of it is the
same, only parts like vertex coordinates. Thus the node core sharing
mechanism doesn't always apply. Furthermore there is a pretty
significant variety in the form of data that can be used (e.g. Col3f,
Col4f, Col4ub, T1f, T2f, T3f etc.). Fields have to have a specified
type, so a direct field can not accompany this polymorphic data.
Thus the separate data fields are kept in separate field containers
called Properties, which are referenced by the geometry node. As field
containers they can be shared, solving problem 1. They can be derived
from a common type, allowing type-safe polymporphism (problem 2). They
would add another indirection, but that can be shortcut at
set... time, thus not incurring overhead.
There are two separate ways of specifying the vertex' data. One is
putting all data in an array of structures that keep all the data for
every point. In OpenGL this is usually denoted as interleaved
arrays. The second way is to provide separate arrays (i.e.
properties) for every kind of data (e.g. vertex coordinates, colors
etc.).
A part of the data that is always stored in a separate array is the
primitve lengths. OpenGL has a number of primitives that can have
different lengths (Polygon, TriStrip, QuadStrip, LineLoop, LineStrip)
in addition to primitives that have a fixed size (Points, Lines,
Triangles, Quads). Note that the fixed size primitives are usually used
in large numbers, but they all have the same length and can be
specified using a single glBegin()...glEnd() loop. Thus only
one length is really needed for them. The different length primitives
need a begin/end for every primitive, so to have a number of them in a
geometry node the length has to be stored for all of them.
It would be simple to have only one kind of primitives in a geometry
node. This would be a tough restriction, though. Especially when striping
general models, usually stripes and fans are used intermittendly. Thus
an array can be specified, to define the type of every primitive to be
rendered.
The last distinction between different kinds of geometry is the
indexing. The data can be used directly in the order given, with no
indexing. This is always useful for points, but in general non-indexed
data can need a lot of replication. Taking a height field as an
example, vertices in the middle are used by 4 quads. Without indexing
they would have to be stored 4 times. Thus indices can be supplied. In
the most general case a separate index can be supplied for every data
type. In many cases a single index for all data types is sufficient,
and more effective, thus this is allowed as a special case.
The last part that needs to be specified for the geometry is the
binding. Properties can be used not at all, for the whole object, for a
face, or for a vertex. Only a small subset is actually useful. Texture
coordinates have to be per vertex, overall normals are rarely useful etc.
The main distinction left is between per vertex or per face binding of
colors and normals, which is accomplished by a simple boolean. One useful
special case is an overall color, which can just as easily be defined
by a color field with a single color.
'Per Face' can mean lots of different things. OpenSG uses the 'per begin()/end() loop' version. A real per face for triangles or quads should be simulated by per vertex and proper indexing.
Many of the above concepts are orthogonal and can be combined pretty freely. Total generality is going to be too complicated, though, so we'll have to restrict some combinations and split the full space of possible combinations into several classes. So if you have a somewhat unusual combinations that you need, tell us about is. (DR)
The actual rendering is done in different ways, depending on the combination of attributes used and the availability of OpenGL extensions. Vertex arrays are the preferred way of rendering, if possible and supported. For the other cases specialized rendering pumps are generated to specify the data in an optimal way. There is tricky balance here between efficient pumps and having too much code and trashing the icache. The optmization action should convert the more exotic combinations into something more easily rendered.(DR)
Geometry nodes can analyse the data and try to render it more efficiently. Examples for this are the glDrawRangeElements() extension or the use of display lists. But deriving this information can be costly, so it only makes sense when it's used more than once, and not for geometry that changes every frame, like morphing models or progressive meshes. Thus the geometry keeps a flag indicating if this node is going to be static for a while, thus making analysis worthwhile. This could be done somewhat automatic by keeping a memory of the last frame the node changed and analyse it after a number of static frames have passed. We do that rather successfully in Y right now, so it might be worth adding. (DR) The same reasoning applies to other derived data like bsp trees for efficient back-to-front rendering. These are only possible for TRIANGLES, QUADS and POLYGONS, as they can be rendered individually. BSps are used in situations where the z-buffer is unavailable, mostly for transparent geometries. By default geometries with 4d colors are not considered transparent, as transparency is very expensive and C4UB is a very common color format. To activate transparency processing the referenced material has to be transparent, it's transparency will be replaced by the vertex alpha.
To simplify algorithms that want to work on the triangles the geometry can create a triangle iterator, which will walk the geometry and be able to return the vertex and primitive indices of all the triangles. This will be less effective than directly accessing the data, but more convient for many general purposes.
Additional vertex/face data can be added as attachments, if the application needs it. Some examples would be data for collision detection or for radiosity calculations. This data is not automatically updated or invalidated, though, the application has to take care of that. Should we make this simpler? I don't like the idea of having to traverse the attachment map in every changed() call for a geometry, or for every field conatiner in general. It would make the system more general, though. But for these big additions derivation might be acceptable, allowing overriding of changed(). Alternatively, if we have a timestamp scheme for the automatic update of geometry's derived data, by storing a similar timestamp in the attachment and giving access to the geometry's lazy update is possible. As long as the timestamp access is fast enoough, the overhead should be acceptable.(DR)
highlighting/selecting?
dlist priority? per object importance?
per frame calculated attributes (bumpmapping texcoords). calculated in
app/cull/draw?
single poly billboard? for particles sprite extensions? cpu-based billboards
(with/without matrix stack changes)
localized fog with bvolume
clipping
guaranteed framerates/progressive refinement rendering?
volumerendering?
Polygonal geometry is not enough for many applications, especially for VR applications in constructive sectors, e.g. automotive. Thus OpenSG contains capabilities to convert higher-order representations of surfaces into a polygonal representation that can be rendered directly.
The conversion of higher-order surfaces to polygons can happen in different
forms to different levels of accuracy. The most common criterion for
tessellation is a chordal difference measurement that defines a maximum
distance between the original surface and the polygonal representation. An
interesting approach from the rendering side is dividing the surface into a
defined number of polygons that optimally approximate the surface. To
accurately render surface normals also have to be calculated. Having original
surfaces available allows calculating the numerically exact normal at the
calculated point.
Both of the above mentionened approaches can be costly to compute. In cases
where a faster result is needed, the surfaces are able to tessellate themselves
according to an abstract tessellation level, specified as a simple unsigned
integer, with 1 being the coarsest tessellation. The only constraint is that
higher levels should create a finer tessellation. This mode is meant for a
rough initial display in situations a full tessellation is not available yet
like right after adding or not feasible due to fast changing geometry.
Furthermore it's easy to implement and a good start for testing.
Sometimes even the simplest tessellation might be too complex to display the scene in a useful manner. Either because it's too complex to be rendered in an acceptable time, or because the sheer mass hides important details. For these cases the surfaces should be able to create a more schematic representation. For free-form surfaces the borders and trim curves are a natural representation. Other surfaces might use silhouette information or other constructive knowledge.
Higher order surfaces are derived from group. Their children should only be generated by tesselating the surface, although that is not enforced. The tesselation can create a single geometry, or an LOD node with several tesselations. It could also generate a progressive mesh for continuous levels of detail.
The above mentioned are general capabilites of all higher order surfaces, the supported types and more specialized features are described below.
NURBS can be trimmed by piecewise linear and b-spline trim curves. These curves
are defined in the parameter space of the surface. Is that enough or do we
need geometry-space trim curves? How hard is to convert between both? Do we
need to be able to render trim curves directly? If they're in parametric space
they have to be converted anyway, so the trim curves can be private data
structures of the surfaces that don't need to be understood by the rendering
part. Are other trim curve types needed?(DR)
To define B-Spline curves in geometry-space, besides one dimension more for
the control points we need nothing more than for defining (2D)-trim curves in
parametric space, so we should have objects for 3D-B-Spline Curves (we can
misuse them for 2D). Conversion between geometry-space and parametric space is
expensive (and approximative in both directions in general) (I may have some
code for that...), but if we need the parametric-space trim curves only for
rendering, they probably needn't to be converted to 3D-B-Spline curves but
approximated piecewise linearily before converting to 3D (AR).
NURBS surfaces rarely stand alone, many of them are combined to form one
continuous surface. To prevent cracks appearing between these partial surfaces
when tessellating them topological relations between the different surfaces
have to be analyzed and stored.
How important are double precision definitions? Is support for double
precision basic types like vectors etc. really needed?
I'm not sure how back patch culling could be integrated without having the
higher order surface know all about rendering. One idea would be to derive a
special geometry that knows about it's originating patch and can cull itself
accordingly. Sounds like a feature for a later release... (DR)
One important aspect of efficient rendering is minimizing state changes. To do that the objects (in OpenSG: geometries) active for a given frame have to be collected and sorted so that the number and expensiveness of state changes needed to render them is minimized.
There are two parts to doing this: finding out what the OpenGL state is for the
geometry to render, and to sort the used states so that changes are as cheap as
possible.
Inventor and Performer represent two extremes in doing that. Inventor has no
concept of a material, everything is inherited during the tree traversal. Thus
rendering Inventor geometric objects out-of-order is pretty much impossible.
Performer uses the other extreme, where everything except transformations is
collected in the geoState (there are global default and override mechanisms
which make the picture somewhat less black and white, but that's the idea).
Neither approach seems to be the perfect solution. Out-of-order rendering is important for state change minimisation, but having to specify the lightsources in the material seems a bit harsh. OpenSG tries to find a sweeter spot in the middle.
The state of the geometry is mainly defined by the material it's referencing.
In addition to that the active transformation as defined by the product of the
transformations of all transform nodes higher up in the tree, the active
lightsources higher up in the tree and the active environment play a role.
The material attributes are further divided into chunks of attributes,
which usually are changed together, like the lighting parameters, the
state contained in a texture object etc. This chunking is needed to
reduce the number of independent variables to a manageable number, as
the full OpenGL state is rather big.
The simplest material just wraps the OpenGL state and gives an
interface to the chunks that make up the OpenGL state. Chunks can be
added at runtime to allow adding extensions and new features. Not all
chunks have to be present in every material, but chunks can be added to
every material.
The chunks can be a direct reflection of the underlying OpenGL
parameters, but they don't have to. There can be different chunks for a
given type, with different interfaces. One example would be a a chunk
that specifies the lighting paramters based on a common color and
intensity values, or a chunk that specifies the colors in the HSV space
instead of RGB.
The chunking model is pretty nice and makes the OpenGL state more
manageable, but there are still some problems I'm not quite sure how to
solve.
One is overlaps in chunk state. If two chunks change the same OpenGL
state, there is problem when changes are minimized. This could be
solved by just disallowing that to happen. Which can be hard to enforce
when multiple extensions are created that access the same new state. It
might not be as bad as it seems, as chunks are not a lot higher than
OpenGL state, so having just one chunk for new state should be
acceptable. However, too fine-grained chunks don't deliver on the chunking
promise of making the number of states to watch smaller. I don't feel
completely confident in the model right now, but it's a decent start.
(DR)
But a material can be more than just a chunk container. Materials
specify the rendering parameters for a surface, however that rendering
is realized, which might depend on the underlying hardware. For example
a material that uses a surface texture in conjunction with lightmaps
might be rendering using multitextures if available, by multiple
blended passes otherwise.
To override some parameters Override nodes can be placed in
the scengraph. These nodes contain chunks that override the chunks of
the same type further down the tree, either in other overrides or in
other materials.
Chunks have three functions they can perform: activate themselves starting from an empty state, deactivate themselves leaving a default state, and change from themselves to another instance of the same type. Changing from themselves allows derived types to add some state that can be deactivated by them, when they recognize switching to a base type. All these functions have an associated cost that can be queried. This cost could be used to minimize state changes for the drawer. The problem in general is equivalent to the traveling salesman, though, so an optimal solution is not feasible for every frame. The chunks should provide data to base an approximation on, though.
I like the idea of materials being an abstract interface to an underlying chunking representation, maybe even including multi-pass effects and automaticly switching between algorithms depending on the underlying hardware. But I'm a little afraid about combinations and limitations. Let's say I have a nice interface for the lighting parameters including handling transparency in a decent manner (which means sorting, depth buffer changes etc.). Now I want to change the linewidht of the geometry rendered using this material. I can use the direct chunk access, if I know what to do. But for more complex things that involve several chunks it's going to be difficult to judge the interference between them. This still needs some sorting out, but the basis feels good.
To facilitate efficient state changes the materials fact that a set of chunks belongs together in a material might be useful to keep. Thus two objects can be compared very rapidly if they have the same material state. That should be easy enough by keeping a mterial pointer/id in the draw tree nodes.(DR)
Just some random thoughts. Well, a lot actually:
Does anybody know of an OpenSource or PD image class in C++ that handles
most of this already? I don't feel like reinventing yaw (yet another wheel).
What about a fast loading image format containing mipmap levels? Either that or
allowing separate images (or a sequence of images) for mipmaps. Might be useful
as soon as expensive operations like FFT are used to calculate
mipmaps.(DR)
The mipmaps are only considered if the minification filter is set to
mipmapping, otherwise they are ignored. Point and linear filters are also
supported, as well as symbolic filters for default, fastest and best quality.
These depend on the running system and can easily be changed by modiying the global defaults object. The same constants can be
used for the texture's internal format, in addition to the standard OpenGL
ones. The wrap modes to be used are also stored in the texture. The constants
used are the same as the constants used by OpenGL, so an application that knows
what it's doing can use new filter mode constants if applicable. Or a new texture chunk type can use the new filters.
The texture border is not supported, as it is not recommended on important
hardware anyway.
After the image has been changed the texture class can be hinted to reload the texture, which is more efficient than creating a new one.
paletted textures? still needed? Dont think so. Well, think again. The Pisa bumpmapping uses them, and it looks pretty neat. A bit limited but neat.
anisotropic filter? just another constant
z textures
detail textures? Use simple multitexture/multipass instead?
chained textures for multitexture?
texture compression -> special filter constants
procedural/animated textures? (sequence, avi, mpeg)
video textures?
framebuffer textures?
cliptextures? Gulp... Looks like a LOT of work... :-/
(DR)
The simplest culler just selects all objects. A view-frustum culler checks the bounding volume of the traversed object against the active viewing frustum. If it's completely outside the frustum it's discarded, if it's completely inside it's children will not be tested, if it's partially inside the same operation will be done on the children. If it hits a transformation it will transform the frustum into the new space and continue working.
Other cullers might use a portal system to cull hidden objects, or select LODs based on a global cost/value scheme, a predictive scheme or a reactive stress-based scheme .
An interesting culler, but a somewhat out of the ordinary one is the occlusion culler. It uses OpenGL extensions to check the bounding volume of objects against the already rendered objects and discards them if no pixel is visible. But that decision can only be made in the drawer process. Thus the occlusion culler has to sort the objects to first render the ones close to the viewer and insert some functors to check the bounding box and skip the next object if it's not visible. Depending on which OpenGL extension is used it makes more sense to check a whole number of objects (to prevent pipeline stalls) and make all the decisions at one point.
Cullers are simple actions. This allows a specialized traversal to apply multiple cullers successively to the tree by cascading them. This can either be done on a node by node basis (call the next culler in line on the result of the last) or, to pipeline the different cullers in different threads, by creating a new temporary tree after some culling steps. That would be architecturally nice, but I'm a bit sceptical about the efficiency of creating a new tree. Creating all the nodes and attaching them to the data might be too expensive. Multi-parent would be useful here. (DR)
Some cullers may want to take all objects, sort them and render them according to some criteria like distance to the viewer or importance. after some steps they will insert abortion tests, to abort rendering when the rendering time exceeds a target timeframe. To do this a special return value for the action is used. It forces the cull manager to not call the culler further down the line, but continue on the next node level. After the whole tree is finished a finalize method is called on the culler, which creates the result list and calls the next culler in line.
Conceptually it would be nice if the sorting and rendering stages would
work on a standard tree. But as said above, creating a full-featured
multi-thread safe tree copy for every frame is probably going to be too
expensive. Thus a special culler is used to close the list, which
creates a specialized structure for sorting and drawing, the draw tree.
The draw tree is a tree consisting of a specialized simple node type
that keeps a list of the active state chunks (s. Materials), including active transformation and
lights, and an object reference of the object to be rendered. These
don't have to be geometry nodes, every object that has an applicable
rendering method can be used. Maybe just use a functor, very general
and already part of the system.(DR) This also allows special
effects like inserting a temporary viewport and storing the result in a
texture. This could be done using real viewports, but for some applications the smaller
solution may be easier to use.
In addition there are three grouping nodes, solid, soft and brittle the names are open for suggestions.... Solid nodes are cast in stone, all their children and only their children are rendered in the given and exactly the given order. Soft nodes are the opposite, their children will be rendered in any order, they can be deleted and their children moved up to the parent etc. Brittle nodes are in the middle, they guarantee that their children are rendered in the given order, but other things can be inserted between them.
Multipass is done by adding a node several times into the draw tree, using different chunk sets. Brittle nodes allow doing that easily without giving up all the benefits of state change minimisation. Objects rendered using this multipass should use a brittle node, and later the brittle nodes can be removed and all the objects using the same state can be aggregated.
Some multipass algorithms should only be applied to a part of the scene, e.g. for depth map shadows a subset of all objects will throw a shadow on another subset, for efficiency reasons. There are two ways to identify these subsets. Either store a list of objects for each subset, which constitues another reference to the objects which has to be removed when the objects are destroyed. Or use a subset of the traversal mask to identify the targeted nodes. The multipass process will ignore the nodes not fitting the mask. A convenience function for propagating all masks from the bottom up, to prevent missing a node because some of its ancestors don't have the mask set, is provided. The mask approach feels more hackish/lowlevel, but I'd guess it's pretty flexible and hopefully efficient enough.(DR)
The draw tree is not thread-safe, as it doesn't need to be and is
rather passed in pipeline fashion to the rendering module.
All the intermediate nodes might not be very efficient, but I think
the model is pretty general und useful the way it is described. Several
optimizations are possible with cullers that know more about the
rendering and can create a smaller tree (e.g. know about multipass or
transparency and skip a lot of the brittle nodes). (DR)
These are the random thoughts for the whole rendering chapter.
callbacks? derive new nodes/chunks
mirrors
detail culling: ignore sets < #tris or < bbox size
frame abort before overrun, needs prediction for blocks
transparency, trans per face (BSP, no other sensible way)
fixed framerate by early abort. needs dtree sorted by
importance/distance to viewer. Abort when? Either safety margin to
frame border, or just after missing it (stupid), or having some
prediction to make the safety margin smaller.
dynamic envmaps
motion blur
But windows are only a part of the window-system dependent data that needs to be handled.
The frame-buffer configuration defines which of the OpenGL defined buffers (front/back, stencil, depth etc.) are active for the created window. OpenSG can inquire this information after window creation for the OpenGL, but in general the user will not want to care about selecting the right configuration. Thus a number of functions to select a visual that can be used to create the window is supplied for each window system. These selections can either try to select a general visual, but for better resource usage they need hints which features will be used. As more general applications might not know that, the selectors can analyze a given scenegraph to find out which features need to be supported.
The last structure that needs to be organized is the rendering context. For a multi-threaded system this needs to be handled carefully because the context can only be bound to one process, and only this process can call OpenGL commands. Thus the window system should not create a context, respectively destroy the one it already created and rather let OpenSG create and activate the context. All created contexts will share display lists and texture objects.My experience is X-based. Does creating a window in a different thread than the one rendering to it work on NT at all? What are the constraints?(DR)
A separate drawing thread can be created for every graphics pipe in the system. The pipe-specific information is collected in a pipe object. The pipe object keeps pipe-specific information like the width and height, the supported OpenGL extensions, the drawing thread and a list of the windows that are open on the pipe.
OpenSG does store an object for every window that holds information to identify the window, which context is to be used for it, its position and size and a shadow copy of the context's OpenGL state to minimize useless state changes. Still needed? Or depend on material state knowledge?(DR)
Window callbacks? -> decorator ?
overlay? pbuffer?
context sharing? Should be automatic as much as possible.
problem: keeping dlists/texobjs consistent across the contexts. idea:
associate a creation functor with every list/texobj on creation, so
that they can be recreated for a new context if needed. Might be
impossible if data goes away after creation.
alternative config wins?
window attributes: rgb size, stencil depth, depth depth, dest alpha, stereo,
multisamp (extendability), accum buffer
A window is not actually rendered into, it can be subdivided into multiple viewports.
Viewports are ordered and all are rendered in that order before the final image is swapped. Windows have methods to add a new viewport to the front or back of the viewport list. Note that the system can (and will) create scratch viewports on its own for special features that need prerendering. These will be added at the beginning of the list, and as such should be invisible. Unless the application's viewports don't cover the whole window, in which case they might be visible. To prevent that, applications that don't utilize the whole window should create a viewports to cover the unused area which only clear their background but don't have a scene to render.That would be good idea anyway, otherwise the residue from finished applications might show up in those areas.
Viewports clear themselves using a background object they keep referenced.
OpenSG supports a number of background objects that allow simple creation of
interesting backgrounds. The simplest doesn't clears the background at all,
allowing overlays onto other viewports, consecutively more complicated ones
clear the background to a solid color, or clear to a static gradient, or a
gradient depending on camera orientation, or a textured background depending on
camera orientation. Backgrounds can also use a combination of image and depth
buffer to clear the viewport to a predefined state, onto which other objects
are rendered and integrated correctly.
To draw viewports one has to post a draw request to them. They are scheduled to be redrawn, but the drawing is only done after an explit global redraw call. This is also the point where windows that were uncovered by the user since the last redraw are redrawn. Note that if one viewport of a window is redrawn, all of them have to be redrawn due to the OpenGL definition of backbuffer behaviour.
Viewports can be synchronized to force a joint redraw whenever one of them is redrawn. The buffer swaps of their windows will then be synchronized, so that they swap at the same time.
Viewports have a target buffer to draw to. To create a pseudo-single buffered viewport it is possible to change that target buffer to the front buffer. Or the target buffer can be directed to the left or right buffer, as needed for stereo rendering.
The viewport also contains a pointer to the camera that should be used to render the viewport.
Viewports also have a function for keeping data. As there can be any
number of viewports active, all of which might use a different camera,
seeing a different scenegraph, data that should be kept from frame to
frame, e.g. to exploit frame to frame coherence, has to be kept in the
viewport. The viewport has attachment map similar to the ones used by
ndoes and node cores.
Cameras can be decorated (i.e. extended using the decorator pattern) in several
ways. Note that one camera can be used by multiple viewports, if needed with
different decorations.
A stereo decorator will create projection and offset parameters to create an
image to be viewed on a stereo capable display.
A projection decorator, derived from the stereo decorator, will create projection parameters for drawing images to a head-tracked stereo screen. It needs another node to reference the position of the viewer relative to the projection screen, in addition to the geometry of the projection screen. Where to put the fog handling kludge?(DR) A rotation decorator will turn the camera a specified amount to the side to render panorama projection screens. Is that really enough? Still can't believe it.(DR)
A subimage decorator will cut out a piece of the image and use it to fill the viewport. This can be used to split the image across multiple viewports, possible on multiple graphics pipes, e.g. for a multi-pipe powerwall display.
Another bunch of random thoughts. These are just to remind me of what things to think of.
rendering flags like drawboxes depending on Camera? different
classes?
cave specific near/far?
single viewer coordinate system? separate methods to get
proj/trans/projtrans and virtual setup to be overriden by
decorators
wall rendering using standard viewing model
panorama rendering
dome rendering
callback functions to override default behaviour:
culling/drawing/swapping? decorators?
frame rate management: fixed rate
viewports, vp specific data (sort lists etc), callbacks, attached
objects?,
stereo mode per vp
viewport parameter sharing? (Perf: channel sharing) LOD?!?
aspect ratio per viewport and per pixel
scene aa active? if available
light model?
fog? modes?
swap synced (optional!) between viewports/across pipes/machines? redraw
synced
clip policies
video channels? DVR?
explicit sync to next frame to wait for updates and prevent latency due
to buffers
hyperpipe mode?
background stencil image?
rendering per channel (unlocked/synchronized)
rendering modes/viewport (draw boxes etc.)? (DR)
simplifier
triangulator
demo viewer: standalone, win, unix, GLUT, QT, GNOME, MFC
builder
calc normals
unify
spatializer: join small parts, split large parts, build hierarchy
performer loader support
loaders: vrml2, fhs, iv/vrml1, image as terrain?, iges, step?
external loaders. identified by extension? initially ok, but better would be a
way to use fingerprints in files. how to map fingerprint to filename?
extension aliasing
loader attributes
preloaded/linked loaders
transparent zip/gzip support? lha/winzip/arj/ace/rar? PenguinFile?
testing format for functionalty? scripting interface probably better.
optimizer: optimize all be default, give max performance. option: keep groups,
keep a given list of groups. compromise between perf und J3D
strip colorizer/visualizer
simplifier
progressive meshes?
disassembler: separate sets into single triangles with attributes. useful for
striping/fanning/calcing normals etc.
triangle iterator: iterate through all triangles of a geometry/subtree
to work on them.
(DR)
I have been working at IGD since 1992. All my work was connected in one way or another to high-performance rendering. First for the Genesis Radiosity system (Versions 1 and 2), later for the Vis-a-vis rendering kernel. My diploma thesis [3] was the design of a new rendering system, Y, which has become the basis of the Virtual Design 2 VR system. The renderer is not quite as complete as Performer, but usually just as fast, so that's ok with me.
initial version
[2] "Performer", SIGGRAPH 94
[3] "High-performance high quality rendering for Virtual Reality Applications", Dirk Reiners, Diploma Thesis, Technische Hochschule Darmstadt, 1994
The contents of this document are (c) 2000 ZGDV e.V. Darmstadt.