Friday, 24 June 2016

xygine Feature: 3D Model Component



xygine is, at its heart, a 2D framework. While working on Lunar Mooner, however, it occurred to me that attaching 3D models as components may look better than creating 2D sprite sheets by pre-rendering 3D models. A lot of work is needed to create a sprite sheet, particularly when updating one after tweaking a model, so it seemed to make sense to provide the ability to render the 3D models directly in a scene. Don't get me wrong, xygine isn't going to become xyzgine at any point (geddit?) – the scene itself remains entirely 2D and any model attached to an entity remains firmly rooted in two dimensional space. It does mean, however, that 3D rendered models reap the benefits of using raw OpenGL, such as real time lighting and dynamic shadow casting. xygine already contained a lighting system which implemented point lights and a single directional light so building on this was an obvious choice. It means that 3D rendered objects are lit in exactly the same way as any of the traditional drawables, making everything blend together nicely. Rendering 3D models directly also means that once a model is animated it can be used right away, without the laborious task of stitching together sprite sheets.



Actually implementing what became the MeshRenderer wasn't trivial, however. Thankfully the task was made much easier by the existing SFML Texture and Shader classes which provide the ability to obtain the underlying OpenGL handle - this gives access to all the flexibility of raw OpenGL calls, while still being able to use the SFML interface. One of the biggest problems was the fact that the MeshRenderer relies on OpenGL 3.2 as a minimum, while SFML works with GL 2.0. This can cause some compatibility problems, particularly on OS X and macs with integrated Intel GPUs. The MeshRenderer class is a completely independent part of xygine though, and if it is not needed then never instantiating it means compatibility problems aren't an issue. The MeshRenderer works by watching a scene and maintaining its own view based on the current active camera. It then makes sure any model components, created via the MeshRenderer's factory function, are correctly aligned with the two dimensional scene. This does mean that the 3D world units are measured in SFML units, approximately in pixels, which can be a bit confusing at first. Often 3D models have units which are much larger such as inches, feet or metres. This can make models appear very small when initially loaded into xygine. The easiest solution to this is to set a scale on the model's parent entity, however the currently the supported model format of choice is *.iqm – an open binary model format developed by one of the Sauerbraten guys, Lee Salzman. There is an exporter written for Blender which, while not totally intuitive to use right off the bat (for me at least), provides a good clean and feature rich output of IQM model files. It has the benefit of not needing an intermediate format (although it also provides IQE, an ascii based output) and one of the features is the ability to scale models on export, so you can make sure they'll fit xygine perfectly. Currently xygine will load an IQM mesh file and any animations which are compiled within it. It also reads material data, but as yet does not parse it and materials have to be loaded by hand. I also plan on supporting animation only files which can be loaded on to a mesh separately.

The MeshRenderer takes a deferred rendering approach, so currently does not support transparent materials, but it does mean lighting calculations are efficient and even with multiple shadow maps a good (~200fps) frame rate can be maintained at 1080p on my old 2.1 GHz Intel machine with an nvidia GTS450 GPU. Performance can vary of course, mostly due to the number of active shadow maps which can be a maximum of nine, eight of which are point lights. Point lights can have shadow casting disabled, however, allowing some fine tuning per application. As the MeshRenderer is drawn over the top of a xygine scene the scene itself generally appears behind everything rendered in 3D. It is entirely possible, on the other hand, to render the scene to its own texture and use that with a quad which can be placed in the MeshRenderer. This gives proper z-ordering results, as well as meaning that 2D objects will receive shadows, for some interesting effects. Quad and cube meshes are readily available because the interface for model loading is designed to be flexible enough that loaders can be created for other model formats with minimal effort. For example as long as the MeshBuilder class is inherited, a loader for *.obj, *.mdl or any other type of model can easily be implemented. This also means it is trivial for xygine to provide such MeshBuilders for cubes and quads. I've completed as much of the doxygen documentation as possible, although I've yet to fully update the wiki as some aforementioned features are yet to be implemented or finalised. The xygine example application includes a MeshRenderer in the 'Platform Demo', a video of which can be seen here:





The menu on the side appears in debug mode and allows switching the output between the various deferred renderer stages as well as the active shadow maps (black shadow maps are inactive or non-existent shadow casters). The release build of xygine completely omits this menu for the sake of performance. There are also a series of console commands available for the MeshRenderer, which can be found by typing 'list_all' into the console while a MeshRenderer instance is active. These mostly mirror the function of the debug menu.



As always the full source is on Github, released under the zlib license. I'm currently looking for OS X testers, so contributions via the Githuib page are gratefully accepted.



References:


Batcat model supplied by my talented friend Josh

Saturday, 28 May 2016

Explicit function parameters in C++ 11 (and a lesson in cross platform data types...)

A slight detour from the normal content this post, but I recently discovered a neat trick in C++ which I'd really like to share. When working on xygine recently I blundered into a bug when using the following function in the MultiRenderTexture class:

create(sf::Uint32 width, sf::Uint32 height, std::size_t count, bool depthBuffer = false);

What this particular function actually does isn't important, so much as what I was doing wrong. (If you're interested it creates a new render target sized width * height with count textures). In my code I mistakingly missed out the count parameter:

create(w, h, true);

The problem with this is that the boolean value implicitly converts to an integer without even a warning, and because of the default value supplied in the signature of the function what the compiler saw was:

create(w, h, 1, false);

The bug right here was that I couldn't fathom why a depth buffer wasn't being created on my render target (due to the 'false' parameter). Once I finally figured out what exactly the bug was, it got me thinking about how class constructors can be labelled 'explicit' to prevent this sort of thing from happening, and how it'd be useful to do the same for normal (and member) functions. The explicit keyword cannot be applied to functions in the same way as constructors, but a short trip on the googlemobile revealed this, in my opinion, much underrated answer on stackoverflow. The idea is this: using the C++11 delete feature (you've probably seen it applied to default copy constructors and assignment operators) which *can* be applied to functions, you can create a deleted template function so that all but the explicitly stated overloads are deleted. These signatures are then implicitly explicit, if you will. The modified create function then looked like this:

create(sf::Uint32 width, sf::Uint32 height, std::size_t count, bool depthBuffer = false);
template <typename T>
create(sf::Uint32 width, sf::Uint32 height, T count, bool depthBuffer = false) = delete;

Now with anything but a size_t as the count value the compiler would throw an error. Perfect. Or was it? What followed next was a little lesson in typedefs, and cross platform implementations of the STL.

Eagerly I tried my modified source on linux. On windows I had been working with a 32-bit build, and with the explicit function parameters in place I needed to state '1u' as the count value. Not even '1' but '1u' as std:size_t is a typedef for an unsigned int, and the signature had to match exactly. Unfortunately when compiling the source on xubuntu with g++5.1 I was presented with an error claiming no matching signature was found. Hm.
Being the patient and open minded soul that I am I immediately wrote it off as 'a bug in g++', cursed linux a little bit, then went away disappointed my new found trick wasn't going to work everywhere. Of course I was wrong. Later that day a discussion on IRC about how std::time_t was a 64bit value in 64bit builds but only 32 bits in size on 32bit builds got me thinking... to which other typedefs does this apply? Quickly I realised that, unlike my current windows build, my linux build was 64bit and, indeed, std::size_t was a 64bit typedef. To get a matching signature g++ was expecting '1ul' not '1u'. Hoist by my own explicit petard! Simply changing the type from std::size_t to sf::Uint32 (SFML has its own set of nice cross-platform typedefs, we'll ignore the fact that any of these types are hugely overkill when representing any value less than 5...) ensured that the deleted template method now worked across all builds. Fantastic!

 And not one, but two lessons learned.

Wednesday, 25 May 2016

xygine Feature: Deferred Rendering

Although I claimed that I wouldn't be posting a lot about xygine outside the wiki, I also have to admit that every once in a while I like to give my own horn a damned good toot. One of my favourite features of xygine is the MultiRenderTexture. This class inherits the sf::RenderTarget class from SFML, making it a compatible drawing target that behaves not unlike a regular sf::RenderTexture. The main difference is that it contains up to 4 textures which are all drawn on at the same time. This is ideal for effects such as deferred rendering, where normal map, colour and mask data are all drawn to seperate textures, then blended in a single call which performs all the lighting calculations at once. This gives rather pleasing results when combined with the lighting and default normal map renderer provided with xygine (the banding is an unfortunate side-effect of video compression):


It is of course also flexible enough that user defined shaders can be implemented easily for any effect desired. The source for the demo is now included as part of the example project in the xygine repo. The textures were created from a 3D model made by a good friend of mine, whose other, rather beautful, creations for Dota2 can be found here.

There is one small caveat however: due to the inverse Y coordinates of SFML the output of the MultiRenderTexture can appear upside-down when drawn with a regular sf::Sprite. SFML works around this internally by flipping its textures, but unfornately xygine does not have access to this. It can be easily worked around when using an sf::Sprite, however, by setting the sprite's Y scale to -1. VertexArrays can simply invert their texture coordinates, and, when feeding the output to a shader uniform (as one would, when combining the textures in to a final image), the inversion doesn't matter at all because it is as OpenGL natively expects it to be.

Of course the MultiRenderTexture isn't limited to deferred rendering - other tricks can be performed if you're feeling creative. For example this scene is rendered to one texture normally, but on the second texture a faux 'depth' value of the drawable is used to set the pixel colour. Blurring a copy of the scene and then blending it using the faked depth texture can provide an interesting depth-of-field effect:

uniform sampler2D u_texture;
uniform float u_depth;

void main()
{
    gl_FragData[0] = texure2D(u_texture, gl_TexCoord[0].xy);
    gl_FragData[1] = vec4(vec3(depth), 1.0);
}


MRT Textures



uniform sampler2D u_colourTexture;
uniform sampler2D u_blurredTexture;
uniform sampler2D u_depthTexture;

void main()
{
    float mix = texture2D(u_depthTexture, gl_TexCoord[0].xy).r; //all channels are the same so pick one
    vec4 colour = texture2D(u_colourTexture, gl_TexCoord[0].xy);
    vec4 blurredColour = texture2D(u_blurredTexture, gl_TexCoord[0].xy);
    gl_FragColor = mix(colour, blurredColour, mix);
}

Output of blended textures

This is but one of the many features xygine has to offer. To find out more about what you can do with xygine take a look at the wiki.

Wednesday, 18 May 2016

SFML TMX Map loader 2.0.0

...is being worked on. No it's not out yet so don't get too excited. I thought I ought to post a status update, however. The popularity of the map loader still surprises me - after all it was a learning project which I started 3 years ago now (a long time in the world of programming) because I wanted to load TMX files from Tiled for a project I was working on. While the map loader has certainly come a long way since, and supports some features of which I am very proud, it is, ultimately, a buggy, poorly designed mess. A few months back I set out to address this, and created a new branch on the github repository. I've even worked on it a bit. So far I've taken the peculiar naming scheme I used at the time and replaced it with something that regular SFML users will be used to. I've paid serious attention to the build system too - the CMake file is much improved and compatible with KDevelop and QtCreator. There's also an included Visual Studio project for the library and example files. On top of this I've made sure the interface is properly exported so that the library can be built as a shared library, either a .dll, .so or even a .dylib. I've worked on a few issues from the tracker too, including vastly improving the MapObject class, making the most of SFML classes such as sf::Transformable. Unfortunately there's still some way to go before a full release, I've updated the issue tracker where I can with bugs and features tagged for the 2.0.0 milestone which I'd like to fix and implement eventually. Time, as ever, is the enemy unfotunately and I just don't have enough of it to work on all those issues right now. I'm not abandoning the project, but I would like to put it out there that I, and all the other users of the library, would certainly be grateful for any contributions (as well as to all the existing contributers). Hopefully a community driven version 2.0.0 will prove to be a vast improvement over the current version. You can preview the already superior version of the map loader by checking out the 'next' branch of the repository.

Monday, 9 May 2016

Introducing xygine

I have briefly mentioned xygine in previous posts, and now that it's further along in development I'd like to talk about it in a little more detail. Over the last few years, working on various game-oriented projects, I started to build up quite a large reusable codebase of features often used in development. These are features such as an entity-component system with renderable scene graph, post process effects, networking connections and configurable animation systems. I eventually collected all of these into a single library to which any SFML based project can be linked in order that game prototyping can be done quickly and easily, as well as remaining flexible. Boiler plate code such as reading/writing preferences to disk or creating a state stack are all provided so that combining stock components with custom component data can quickly create game entities and ideas can be tried out in a relatively short amount of time. This library I have dubbed xygine - simply xy because of its 2D nature, along with 'gine' - short for engine. Strictly speaking xygine is a framework and not an engine, but xyfram wasn't as catchy...

    xygine is open source under the liberal zlib license and so can be used freely in any project. It works on most supported SFML platforms; Windows, Linux and OS X, although is not tested on mobile platforms. Currently it is very usable, I'm developing Lunar Mooner (working title) a space themed rescue game with it, although xygine receives frequent updates as I uncover bugs throughout development or decide to add new features. Because of this it hasn't warranted a 1.0 release... yet. Ideally I'd like to post various tutorial type topics on this blog about it, but that may not happen as the xygine wiki already has a decent amount of content, and I'd like to keep information as centralised as possible, so any tutorial based stuff will most likely appear there. I've also gone to some lengths to try and document xygine as completely as possible, including full doxygen compatible comments. The documentation can be generated directly from source, and I maintain copy online here, although it may occasionally fall behind the current revision. As a quick demo here's a work in progress video of Lunar Mooner:



There is a list of other games (including pseuthe!) which are based on earlier versions of xygine on the wiki. The xygine repository also includes an example project which demonstrates how to build and link to the library, as well as some of its features such as particle systems, the physics binding to Box2D and a networked (online!) version of pong.

Dynamic lighting and particle systems

Physics with Box2D


While I hope that other people may find xygine to be useful I'd also like to point out that it is part of a learning process, for me, personally. There are certainly flaws in certain aspects of the design as well as in the codebase itself, so while I'd love to hear of other people using xygine I'll not tout it as an all dancing game-dev magic bullet. On the other hand, if not via xygine itself, I'm confident that it'll provide a great platform for good things to come.

Download

Wednesday, 20 April 2016

SpIn - A Space Invaders / Intel i8080 Emulator

So as a follow up to my CHIP-8 interpreter I dove right in to some interpretive emulation. Next on the roadmap to learning emulation seems to be the Intel 8080 processor, according to all the good learning sources, as its simpler instruction set makes it a good choice when you want to see encouraging results relatively quickly. In fact, to play Space Invaders, you don't need to emulate the entire instruction set.
   This isn't going to be a long or particularly technical post, however, as, similarly to the CHIP-8 post, I don't want to reiterate information already shared and better explained by other sources - rather I'd like to share some of my own specific experiences.
    Firstly: interpretive emulation is tedious. Implementing each opcode is dull, and you need to implement a LOT of them before you see any result. If this taught me anything it's the power of unit testing. Being able to test each opcode individually is a real benefit and makes life much easier when it comes to debugging. Yes, you have to write more code, but it's worth it, believe me. You can see how I set up my unit tests here - although on reflection the code design could have been much better, and would have allowed the use of unit testing frameworks such as Catch or the Visual Studio test suite. You live and learn I guess. Secondly, implementing a video system to draw the graphics is rather interesting. Of course I used SFML for my graphics and audio output (as well as windowing and input parsing), and translating emulated video memory to pixels on an OpenGL texture was an enlightening experience. What really pleased me about this project though was that, although the software is far from bug-free, it's accurate enough to run not just the original Space Invaders software, but other games too such as Lunar Rescue and Balloon Bomber which were designed for the same hardware.


I admit I had big plans for this project, including extending the instruction set to emulate the (theoretically) compatible Z80 processor, although this is unlikely to happen. Currently xygine and a new game project are taking up rather a lot of my spare time. With any luck future posts will be about the features of xygine, and return to a more SFML oriented theme.

The full source code for SpIn is on Github.

References:
Emulator101 - Space Invaders emulation tutorial site.
Computer Archaeology - Arcade machine hardware and software information.

Sunday, 6 March 2016

Chip8 - A CHIP-8 / SuperCHIP Interpreter

Recently I decided life wasn't complicated enough, so I thought I'd turn my hand to emulation programming. It appears to be a very complex topic to get in to, and resources on the internet are rarely clear cut. After a bit of research though it became apparent that the general recommendation was to not start with an emulator per se, rather to look at the 1977 bytecode language CHIP-8, and write an interpreter for it. An interpreter differs slightly from an emulator, as, in this case, the CHIP-8 bytecode is interpreted on the fly, by an intermediate layer of software, into executable code for that platform. This means that a single CHIP-8 program can be run on multiple pieces of hardware without any changes, assuming that that hardware has an interpreter available for it. This was a common approach used by 8-bit machines of the 80s when running BASIC, and is the foundation of modern languages such as Java and anything which uses the .net/mono framework. There are a good deal of resources on CHIP-8, a language simplified by the fact that it only has 35 opcodes, 45 when extended to SuperCHIP, so I'll not go into detail here, opting instead to link some of the articles I found most useful:

Matthew Mikolay's CHIP-8 breakdown
Cowgod's technical reference
The Cosmac VIP Manual (the original hardware implementing a CHIP-8 interpreter)
Laurence Muller's coding tutorial

The latter is a very useful link when writing your own interpreter, and is the basis for Chip8 (my own implementation). Chip8 differs slightly from the interpreter outlined in the article in that it uses SFML for input and graphics (of course!) and also implements the extended SuperCHIP opcodes. If you are interested in writing your own implementation I highly recommend Laurence's article (as well as the other links for details of all the opcodes), and of course you can check out the source of Chip8 which is released under the zlib license. Here's a video of Chip8 in action:


Rather than talk about Chip8's internals, which are already well documented in the aforementioned articles, I thought I'd write something about the CHIP-8 bytecode itself. When writing Chip8 I wanted a program which would test opcodes for me as I implemented them, so I set about writing a short piece of bytecode, that can be found in the Chip8 source here. The bytecode is initialised directly into an array which can be loaded into the interpreter's virtual memory. The test program can be seen in action at the beginning of the video, which displays a message saying Test OK! along with a logo.

CHIP-8 bytecode varies slightly from most common bytecode formats used in interpreters in that each opcode is 16 bits wide rather than 8. This is because the opcode parameters are encoded in the opcode itself, rather than the following 2 or more bytes as is more common in other languages. CHIP-8 opcodes are big endian, and can be read from memory like so:

uint16_t opcode = (pc[0] << 8) | pc[1];

where pc is the program counter pointing into the interpreter program memory. This is, therefore, also incremented two bytes at a time. A typical opcode, such as JMP which jumps to another address in memory, looks like this:

0x1NNN

The upper nibble of the most significant byte, 1, tells us that this is a jump opcode. NNN are 3 values representing the address in memory to which the program counter should jump. So

0x124D

will jump to address 0x24D. Once the format is understood it is quite easy to write a simple program in CHIP-8 bytecode, especially if you've ever done any programming in an assmbly language (If you haven't and are interested in learning then I highly recommend Human Resource Machine as a great introduction). One thing to note about writing in bytecode directly is that without the convenience of things such as labels, which one might use when writing assember, you are required to manually track the address of every single opcode in memory. This can become a pain, particularly when inserting a new line somewhere, as this requires updating all opcodes, such as jumps, which use an address as a parameter. In other languages removing lines would be easier, as an opcode could be replaced with a NOP (no operation) which would keep the memory addresses aligned, but in CHIP-8 there is no NOP opcode so we're out of luck. This is why in my code you'll notice a comment on every line starting with 0x2XX, the address in memory at which the current line of code starts. In the CHIP-8 interpreter all programs are loaded at 0x200 in memory, so this is the base address. I recommend planning out your program as thoroughly as possible in advance as inserting a line, updating all the comments then seeking out all opcodes which need updating can quickly become tedious. With that in mind let's break down the test program.

The very first opcode is a jump. Looking at the example above we can tell that this jumps to 0x248 in memory. This is because static data used to represent sprites (in this case font sprites) and subroutines are placed at the beginning of the program. The jump simply skips this data and jumps to the beginning of the executable code. The characters in the reduced font set are used to display the test text, drawn as a series of sprites. Sprites in CHIP-8 are drawn in rows, each row of 8 pixels is represented by one byte, with up to 15 bytes representing 15 rows (0-F in hex). If a bit in a row is 1 then a pixel is drawn, else if it is 0 it is not drawn. Rows start at the top of the sprite and are drawn moving down the screen. The characters in this particular case are 4 pixels wide, so the lower nibble is always 0, and are 5 rows high, making each character 5 bytes in size. The comments next to each row of bytes in the source code describe the character which the sprite is meant to represent. After the font set there is an empty byte used to pad the current address to an even value. This is because the CHIP-8 spec requires all opcodes to start on an even value address, although in practice I have found that it doesn't appear to matter.

The first piece of executable code is a subroutine. This is used to create a small delay between drawing each character (as well as test the subroutine opcodes), before drawing the character itself. CHIP-8 includes a timer which ticks down at 60Hz. The subroutine uses this by first copying the value 7 into register V2, followed by moving the value of V2 into the timer counter. The sprite is then drawn, before entering into a loop which reads the value of V2 and checks if it is zero. If it is the program moves on to the next opcode exiting the subroutine. If it is not then the current timer value is read into V2 before jumping back to the zero value check again. This causes the subroutine to loop while the counter decrements. Eventually when the timer has counted down to zero the subroutine will exit.

Following this is the main entry point of the program, at which the jump on the very first line arrives. The code block is quite repetitive, although simple. First the I register is set to the address of the first byte of the character to draw. For example the first byte of the letter C is at 0x202. The opcode documentation tells us that 0xANNN is the opcode which sets the I register. Therefore the next line reads:

0xA2, 0x02,

Then, to decide where the sprite should be drawn on screen, the V0 and V1 registers are set to the X and Y coordinates respectively where 0, 0 is the top left corner of the display. 0x6XNN is the opcode which loads value NN into the register X. 0x2NNN calls a subroutine at address NNN. In this case we call the subroutine detailed above, which starts at 0x23A

0x22, 0x3A,

This pattern is repeated for each of the characters that make up the text 'CHIP 8 Test OK!'.

The next block is a bit more interesting. First a counter is incremented which counts the number of times the test loop has been run. If the value reaches 2 it is reset and jumps to check the state of register V4. This is used to decide whether or not to switch the test program to hi-res mode (0x00FF) or low-res mode (0x00FE). The hi-res mode is actually a SuperCHIP addition, and the opcodes are part of the extended set. This is here in the program to check the opcode implementation, as well as the graphics renderer. When the resolution is switched as is the value of register V4. When the program runs it will now toggle between display modes each time it has finished displaying the test sprites.
    Once the display mode opcodes are tested we find another block of static data. Ideally this should have been placed at the beginning of the code along with other static information but, as this was added later to test the SuperCHIP large sprite implementation, it was easier for me to insert near the end, due to the previously mentioned problem with shifting data addresses when inserting new lines of code. The block of data is, infact, a large sprite, 16 x 16 pixels in size. It works similarly to the CHIP-8 sprites, except that 2 bytes are used per row instead of 1, and the height is fixed at 16 rows, totalling 32 bytes. The opcode used to draw this is the same as the CHIP-8 opcode, except that the height value is set to 0, as large sprites always have 16 rows. The opcode immediately preceeding the sprite data is a jump, used to skip the data (if this were mission critical code the data block would certainly be at the beginning, removing the need for any jump). The sprite, when it is drawn, appears as a circular '8' logo on screen.

Finally the program tests the sound output, before entering a delay loop, similar to that of the delay in the draw subroutine. After the delay the program jumps back to the beginning. To test the sound output CHIP-8 requires only a single opcode, 0xFX18, where X is the V register from which to load the duration value. The CHIP-8 interpreter outputs a single tone all the time the sound timer value is non-zero. The timer counts down at 60Hz, the same as the delay timer, so setting the sound timer to 20 will output a tone for one third of a second.

The test program is rather simplistic, and doesn't test every opcode, but it was fun to write and interesting to learn about. Next time, however, I would probably consider using one of the assembler options provided by sites like Pong Story or CHIP8.com, if only because the use of labels would make life easier when inserting lines of code anywhere other than the end. Searching for 'CHIP-8' on Github also reveals many interesting projects, although in varying states of completion.

Hopefully this experience will provide a gateway to future emulation projects, which I will, of course, eventually document in a future post.