Jump to content
  • entries
    943
  • comments
    5,899
  • views
    924,382

Common Bottlenecks


Josh

13,178 views

 Share

blog-0485802001484282644.jpg

Leadwerks 4.3 brings a big performance boost to your games. In this blog I am going to talk about some of the common problems you can eliminate to make your games run faster.

When slow performance is encountered, it is typically one really bad mistake that is slowing everything down. Here's a few common bottlenecks for performance you can create in your games, how to identify them, and how to fix them.

Shadow Updates

Shadow rendering is cheaper than regular renders because no textures have to be used, but extra rendering passes to update shadows can add up.

How to identify: The game stats will display the number of shadows updated. If this is more than a few, you might have a problem. Remember that point lights require six extra passes, and directional lights three, but both only count as one shadow. You also want your rendered shadow polys to be as low as possible.

How to fix: Figure out what objects are triggering the redraw and whether it is necessary. Non-moving high-polygon models should use the static shadow mode so they don't have to be redrawn during a render. In version 4.3, low and medium light quality settings will also stagger shadow updates so that fewer are rendered each frame. (This can also make it harder to detect a problem, so maybe test on high quality settings when you are examining this.)

GPU Pixel Pipeline

The GPU has a limited number of stream processors it can split up the task of rendering an image with. When you overload the GPU pixel pipeline it slows down your program.

How to identify: If you have a much higher framerate at a lower screen resolution, this is probably the cause.

How to fix: Lower light quality settings, remove post-processing effects, or run at a lower screen resolution.

GPU Vertex Pipeline

This is pretty rare because the number of vertices the GPU has to process are tiny compared to the number of pixels, but it is possible.

How to identify: Slow speed regardless of screen resolution, slow even when rendering the scene with no gameplay, extremely high-polygon counts in the game stats (like 2,000,000+). There are some applications where extremely high polygon counts are acceptable, but unless you are specifically making such an application and are aware of this, it probably means you should use models designed for real-time games.

How to fix: Use lower-resolution models or lighten up on the vegetation.

Too Many Objects

The renderer itself has a cost of computation on the CPU. The more separate objects there are, the more the CPU has to work to determine what objects are visible. On the other hand, discarding large numbers of objects can give a big speed boost, so it's always a balance.

How to identify: The render time shown in your game stats will be more than a few milliseconds, and the number of batches, displayed in the game stats, will be very high. It's all relative but if you have a simple scene and 500 batches are being drawn, there is probably a problem. Large spread out maps with dense distribution of objects can often have this problem. This will happen on all machines, regardless of how good the GPU is. The most complex maps I've ever made had about 700 batches rendered. There is no need to go above that no matter how big the map is, because objects in the distance will be culled. The vegetation system does not cost much on a per object basis, so it is an extremely efficient way to lay down a lot of dense objects.

How to fix: Use the model editor Collapse feature to collapse models into a single object and resave them. Also set the view range of smaller objects to a closer distance so there are fewer of them visible in the distance.

Slow Collision

If your game slows down when you get close to an object, you might have a high-poly collision mesh.

How to identify: The physics update time in the game stats will be more than a few milliseconds. Enable "View Physics" in the editor and check to make sure all physics shapes are low-poly.

How to fix: Use the model editor to generate a low-poly physics shape with the best available option.

Code Errors

Is your game constantly reloading files from the hard drive? Are you performing lots of pick operations each frame? Did you create a runaway loop of new objects to process?

How to identify: Comment out sections of your code and test the framerate.

How to fix: Figure out the main section that is causing the slowdown, then keep commenting out smaller and smaller parts until you narrow down the problem. Post on the forum if you don't know why something is causing a performance drop.

  • Sad 1
  • Upvote 17
 Share

66 Comments


Recommended Comments



Found some odd performance difference

1)Build big box (for terrain)

2)Put player prefab with autopistol

3)Put crawler prefab and make it 5x5 (25 crawlers)

4)Play the game and make all of them chase you, and enjoy 60fps

5)Duplicate the map (Save as)

6)Remove the crawlers

7)Replace them with zombie1.pfb (5x5) (From Zombie DLC)

8)Play the game and make all of them chase you, the performance drops to 45-48ish ?

 

I thought the zombie is less complex than the crawler ?

This happen with frame limiting on or even off. Everything is the same.

 

I72600k

Windows 10

GTX970

8GB ram

1920x1080, fullscreen, light quality=3, msaa2, af=16

Link to comment

Neither of those objects will be high-poly enough to make a dent in your framerate, especially with that GPU, and you're not taking a real measurement of framerate if you have VSync on, which I assume you do since the framerate you quoted is 60.

 

Maybe zombies have more bones, but that still shouldn't make a difference since I can render hundreds of characters at that speed.

 

There are limits to how finely you can analyze this. Sometimes your computer is just running a background task silently on the same CPU core, or sometimes my GPU kicks into a higher speed for no apparent reason and I can hear the fan whine higher.

Link to comment

I understand. But the result is too consistent to ignore. The map with 25 zombie1 will dropped to 45ish speed even with vsync off (frame limit is also off)

 

With crawler I get about 75 fps with the same settings. Below are the two published games. Nothing is different except I just change the start map and one of them with crawler and another with zombie1.

 

These are forced 1920x1080, fullscreen. Make sure your monitor supports it.

 

1)Crawler version

https://www.dropbox.com/s/pk3och7lcygydyr/Crawlerversion.zip?dl=0

 

2)Zombie1 version

https://www.dropbox.com/s/5l4tueys4pfbc9w/Zombie1Version.zip?dl=0

Link to comment

I think my game suffers mainly from Too Many Objects syndrome. But I think (fingers crossed) next update should run much better.

Link to comment

I understand. But the result is too consistent to ignore. The map with 25 zombie1 will dropped to 45ish speed even with vsync off (frame limit is also off)

 

With crawler I get about 75 fps with the same settings. Below are the two published games. Nothing is different except I just change the start map and one of them with crawler and another with zombie1.

 

These are forced 1920x1080, fullscreen. Make sure your monitor supports it.

 

1)Crawler version

https://www.dropbox.com/s/pk3och7lcygydyr/Crawlerversion.zip?dl=0

 

2)Zombie1 version

https://www.dropbox.com/s/5l4tueys4pfbc9w/Zombie1Version.zip?dl=0

I don't know, but I get 120-200 FPS in both running it with the 4.3 beta executable.

 

There might just be a script setting that is set differently in either case.

Link to comment

Hi Josh. What are your specs? You get that using the exe i gave above or from your own?

 

For my case, with all 25 zombies chasing me, the performance dropped from 130 to 45.

 

Kinda strange. Are you using ssd or normal hardisk?

 

Both of exe is actually the same project. I just changed the start map

Link to comment

I know 45fps is okey with 25 zombies. It is just weird to me that the crawler get better performance than zombies on the same pc, same settings and everything. I can put almost 30ish crawler before the performance dropped below 50fps.

Link to comment

I assume that is because of the physics and not the models. What if you don't make the model a character controller and just a static model sitting there? If you get higher fps then it's the physics which really wouldn't have anything to do with Lua. If it is the physics couldnt couldn't that be threaded to improve performance?

 

I'd also be curious as to what happens if you remove the UpdateWorld function from their script. That would be more of a Lua thing. We noticed when a lot of objects have the UpdateWorld function it slows things down. There can be alternative ways to achieve behavior without the UpdateWord function in each script. It's not enough to have an empty UpdateWorld, it can't exist so that c++ doesn't call it because the function calling overhead is what slows things down when you have a lot of entities that have an UpdateWorld function and if you're doing a good amount of things in there it hurts even more. You want to bail out of UpdateWorld quickly if you can.

Link to comment

@Rick the performance will drop when all the 25 zombies is chasing you. At start it is around 120 or something. You need to run around to make all zombies see and start chasing you. Can you test the links that i posted above and tell me the speed you get between the two?

 

Both are using the same default monsterai script.

Link to comment

So a big portion of that is physics then and not the models. I can test when I'm home in 9 hours, but that tells me it's more physics and that's nothing to do with Lua but Newton.

 

@Josh is Newton running on a separate thread? If not I would think we'd get a big boost from that. Or perhaps we can control physics update interval? Do zombies chasing on relatively flat ground (game specific) require the update rate physics currently runs at?

 

I'm interested in this because we are working on a game where this matters as well. The interesting thing to ask is does something chasing you really even need physics? If we are able to get the path points that raycast is using then we can use our own move code and bypass physics completely. The physics is what's killing this.

Link to comment

It could still be the models, or maybe just my pc. But josh did mentions that the zombies got more bones than the crawler. Need more testing. :)

Link to comment
@Josh is Newton running on a separate thread? If not I would think we'd get a big boost from that. Or perhaps we can control physics update interval? Do zombies chasing on relatively flat ground (game specific) require the update rate physics currently runs at?

It did originally but I disabled this in the interests of stability. Normally physics updates only take 1-2 milliseconds.

 

I'm interested in this because we are working on a game where this matters as well. The interesting thing to ask is does something chasing you really even need physics? If we are able to get the path points that raycast is using then we can use our own move code and bypass physics completely. The physics is what's killing this.

This is something I want to investigate more in the future. I've been doing some work behind the scenes, but it's still too early to show anything. There might be some potential for improvement, but as I have shown previously, performance is quite good already with large numbers of enemies:

  • Upvote 1
Link to comment

You said above 45 fps with 25 zombies is reasonable. I'd argue agasint thst given he gets 120 fps until they start chasing him. I've noticed this slow down when being chased in the past as well. The question is what's causing 120 to drop to 45. I suspect it's the physics. I'll run a small test with just tweening, or Move() even, the models towards the players this evening to see if that helps narrow it down.

 

Jen mentioned before SetPosition was slow so maybe that causes something behind the scenes?

 

L4D can have 200+ zombies on screen at once. That should be the bar I would think. Sure poly count will come into play there but something else is also going on.

Link to comment

@Josh, im not saying that the performance is bad. I just want to know what causes the performance drops between the zombie1 and crawler. I want to avoid the causes when designing and developing my leadwerks game.

Link to comment
Jen mentioned before SetPosition was slow so maybe that causes something behind the scenes?

There's zero context to that statement. It could be under specific circumstances, but there's nothing inherently inefficient about the call.

 

Left 4 Dead uses a highly specialized multithreading routine that is designed specifically to handle Left 4 Dead zombies. It cannot be used for general games like Leadwerks can. Those kinds of highly specialized solutions are designed for one specific purpose, and that's all they do. For example, there's basically no physics in that game and you will never see an enemy kick over a box like ours do.

 

It might be possible to get the character physics to run faster, but I am not going to lose sleep over a general-purpose flexible routine not running as fast as Left 4 Dead's zombie AI.

 

@Josh, im not saying that the performance is bad. I just want to know what causes the performance drops between the zombie1 and crawler. I want to avoid the causes when designing and developing my leadwerks game.

I don't want to spend time evaluating theoretical setups because that will never end. These are both within the realm of acceptable performance. If you have a project you want me to look at, feel free to send it my way. The further you are along in the publishing process, the higher a priority I give it.

Link to comment

"For example, there's basically no physics in that game and you will never see an enemy kick over a box like ours do."

 

You're not making a game so that's a strange comment to make. Most games don't require physics on AI characters so the fact that characters in LE can kick over a box doesn't really add much value. The ability to choose would be better. That would start with access to the recast path, specifically from Lua would really help.

Link to comment

That won't do you any good because you'll have to avoid collision with other AI, and a plotted path won't do that.

Link to comment

I should add that there are changes to the navmesh and character physics systems I want to make, so spending a lot of time trying to make it faster right now is not a priority. I'd rather implement the changes I want to make and then see if it can be made faster.

Link to comment

That won't do you any good because you'll have to avoid collision with other AI, and a plotted path won't do that.

 

I think it gives us more control and options specific to our games. We can check for our own BB collision once every 200ms and adjust accordingly. We can have flocking logic to make sure they don't collide with each other but are all following one general path. This is generally what's games do anyway. They don't allow character controllers to collide and slide over each other like what happens in LE when 2 or more collide. That's not a case in my view to not give us the path data.

 

There is a common theme of some ppl asking for access to data the engine already has. These topics are for the ones who push the limits. It's data that already exists. I'm curious on the constant push back on said requests for data? You provide the data and one system that works with data. If that system doesn't meet the needs the user is responsible to make it work. This way you have something for the beginner and something for the advanced. Win win.

 

Steering is the term for what I'm talking about and I think can be combined with the path and dynamic obstacles in realtime at an interval that works best for the game. https://gamedevelopment.tutsplus.com/tutorials/understanding-steering-behaviors-collision-avoidance--gamedev-7777

 

Open source liberal for steering. http://opensteer.sourceforge.net

 

This article is amazing! http://www.valvesoftware.com/publications/2009/ai_systems_of_l4d_mike_booth.pdf

  • Upvote 1
Link to comment

I really don't like when people try to replace chunks of the engine functionality. It creates a runaway list of new requests that don't add any value for most users. We already have a built-in system that handles flocking behavior.

Link to comment

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...