| Author |
Message |
|
|
moka
|
Posted: Wed Oct 28, 2009 8:36 am |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
this rocks braaa  I need a new mac!!! NOW!!!! Wow I just took a look at the src.- Really slim too!
|
|
Top
|
|
|
zach
|
Posted: Wed Oct 28, 2009 10:55 am |
|
|
Site AdminJoined: Mon Feb 05, 2007 9:31 pmPosts: 1806Location: brooklyn |
wow nice! can you post the "MyProgram.cl" source too ? am I looking in the wrong place? I guess it would be in the data folder of the particles example but I see only the source. I'm very curious to see how it's done btw, here's some very fast particle particle code -- http://www.makingthingsmove.org/blog/?p=251http://www.makingthingsmove.org/other/f ... rticle.zipand it might be interesting to see if opencl pushes this further ( I had 8 - 10k particles interacting well). take care, zach
|
|
Top
|
|
|
pelintra
|
Posted: Wed Oct 28, 2009 4:18 pm |
|
|
| Joined: Sat Mar 29, 2008 1:05 pmPosts: 261Location: Lisbon, Portugal |
doh! i forgot to include the opencl program!  its online now. @Zach thanks for that i'll try and use that along with opencl to see how far it can be pushed
|
|
Top
|
|
|
Gestalt
|
Posted: Wed Oct 28, 2009 5:55 pm |
|
|
| Joined: Wed Jul 16, 2008 8:07 amPosts: 67Location: Duesseldorf |
Hey! When running pelintras example I get 30fps when running Code: opencl.setup(CL_DEVICE_TYPE_CPU); 10fps with Code: opencl.setup(CL_DEVICE_TYPE_GPU); and also 10fps when running Code: opencl.setup(CL_DEVICE_TYPE_ALL); Did I miss anything? Any ideas? I'm on a Nvidia 9400 mini with Snow Leopard and I thought this would be supported. Regards, Gestalt
|
|
Top
|
|
|
pelintra
|
Posted: Wed Oct 28, 2009 6:10 pm |
|
|
| Joined: Sat Mar 29, 2008 1:05 pmPosts: 261Location: Lisbon, Portugal |
Hi, in that code, i read back from the opencl buffer on every frame which is very heavy when copying data back and forth from the gpu. so that could explain why you get slower framerates when running on gpu mode. jonas jongejan is working on some particles which get updated to a vbo at startup, and then opencl updates the vbo directly, so you never have to copy stuff back and forth. that code i posted is just an early experiment, and trying out something new. im sure there are far better ways of doing things.
|
|
Top
|
|
|
memo
|
Posted: Thu Oct 29, 2009 5:22 pm |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
hey guys! sorry for being quiet lately, been quite tied up  i've commited my current code to svn, it has your changes in rui, as well as loads of other stuff (most importantly sharing context with opengl, multidimensional data, multiple devices, see changelog for more). Unfortunately there is a tiny change that breaks backwards compatibility. Its in the kernel::run() and to allow better support for multidimensional execution. instead of passing a single 'size' parameter, you pass in an array of sizes - or you can use convenience methods run1D, run2D, run3D which take the sizes as individual parameters and create the array for you. I've updated the examples so you can see. Readbuffer is obviously not very good, but in this case it isn't really your bottleneck. The problem mainly lies in what is happening in the kernel, that for loop is really killing it. I've attached a modified version of your example rui which has the following changes (along with mods to ofxOpenCL): 1. instead of reading back the entire particle array, just reading back the positions into a separate buffer (so reading back much less data). 2. instead of using a custom Vec2 type in the kernel, use the built in float2 type and collapse all the vector operations (pos += vel instead of pos.x + vel.x; pos.y += vel.y;) 3. the new ofxOpenCL has openGL sharing contexts, so buffers can be shared, and images can be opengl textures and vice versa. So i've also made the particle return buffer mapped to a vbo, so no reading at all, everything goes straight from gpu to render. nice  so results are: - CPU, 30fps - GPU, original, 13fps - GPU, with kernel optimisations (#1+#2 above), 15fps - GPU, writing straight to VBO avoiding readback (#3 above) 20fps So you can see, it's still much slower on GPU. BUT the reason for that is more about synchronisation i think. loops and ifs in kernels are not very good unless you optimize the hell out of them. Some hardcore tips on that are here: http://www.macresearch.org/opencl_episode6 . it's a bit over my head at the moment  So I added another kernel without the for loop (and no p-p interaction, only mouse interaction), #define DO_COLLISION. So I tried that with 1M particles are the results are: - CPU, 12fps - GPU, outputting straight to VBO for render: 60fps. That is for 1,000,000 particles with basic physics and mouse interaction.  So it all depends on what's going on inside the kernel really. I'm sure the particle-particle interaction stuff can be done too, but it just needs to be written with the smart synchronization between all those thread groups and blocks and processing clusters and streaming processors and all the other hardcore hitech stuff. P.S. Rui, Your example is a really nice and simple particle example, so I hope you don't mind, I've included this modded version bundled with ofxOpenCL (with mention and link to your google code). I'm working on image buffer types next, optical flow at 640x480 at 60fps, that should be fun http://code.google.com/p/ofxmsaof/downloads/listhttp://ofxmsaof.googlecode.com/files/ofxOpenCL_v0.2.zip
|
|
Top
|
|
|
pelintra
|
Posted: Fri Oct 30, 2009 5:17 pm |
|
|
| Joined: Sat Mar 29, 2008 1:05 pmPosts: 261Location: Lisbon, Portugal |
@memo cool  when i get some time ill upload a version of my particles with the newer version of oxOpenCL instead of my modded version. 200 fps with 1M particles?? nice
|
|
Top
|
|
|
memo
|
Posted: Mon Nov 16, 2009 8:03 pm |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
Hey guys, just uploaded v0.3 http://code.google.com/p/ofxmsaof/downloads/listThe major new addition is image objects, and mapping them to opengl textures (+internal restructuring of buffers. Readbuffer, writebuffer etc. is now method of ofxOpenCLImage or ofxOpenCLBuffer). More info in changelog.   This stuff could have been done with shaders. But OpenCL just makes it simpler, instead of creating an fbo, then rendering your texture to a quad into the fbo with the correct opengl viewport setup, you just say 'run this kernel on this image'. Much simpler  P.S. this example won't work on CPU (or will be super slow), because the OpenCL Image objects are mapped to OpenGL textures, i.e. they share the same memory space. The code to do that is 2 lines of code and is also mac specific (to get the current opengl context and pass to opencl). If anyone knows the windows / linux equivalent, lemme know. EDIT I dunno why the images arent showing, but they are here http://www.twitpic.com/ps54e/fullhttp://www.twitpic.com/ps54n/full
|
|
Top
|
|
|
moka
|
Posted: Wed Nov 18, 2009 3:53 pm |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
sweet, so openCL also allows you to edit the whole image array at a time? instead of one fragment at a time as it is with shaders? that sounds really neat for effects where you also need to look up other pixels.
|
|
Top
|
|
|
memo
|
Posted: Wed Nov 18, 2009 4:47 pm |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
actually it is one fragment at a time like shaders (actually it is one element at a time). You can access other pixels in shaders too - that's how blurs work, surely you've done that  the cool thing about opencl is the workflow. You don't need to bother about rendering into an fbo at the right dimensions and setting viewport properties to make sure it all maps correctly etc..
|
|
Top
|
|
|
moka
|
Posted: Thu Nov 19, 2009 8:24 am |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
memo wrote: actually it is one fragment at a time like shaders (actually it is one element at a time). You can access other pixels in shaders too - that's how blurs work, surely you've done that  the cool thing about opencl is the workflow. You don't need to bother about rendering into an fbo at the right dimensions and setting viewport properties to make sure it all maps correctly etc.. yeah sorry, I ment writing to more than one pixel at a time
|
|
Top
|
|
|
Karel
|
Posted: Thu Dec 17, 2009 4:48 pm |
|
|
| Joined: Mon Nov 16, 2009 3:58 pmPosts: 10 |
gello,
maybe this is a redundant question but Im trying to port it under windows, openCL/openCL.h becomes cl/cl.h
What becomes mach_time.h ? sys/time.h
Also I would like to express my joy for this beautifull framework, I left this world years ago with asm demos... No I am back and old time seems stoneage for me... but this was just an apparte...
cheers, Karel.
|
|
Top
|
|
|
Karel
|
Posted: Thu Dec 17, 2009 4:53 pm |
|
|
| Joined: Mon Nov 16, 2009 3:58 pmPosts: 10 |
ok just commented out timing functions
|
|
Top
|
|
|
memo
|
Posted: Thu Dec 17, 2009 4:53 pm |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
Hi, I don't think there is a direct simple equivalent of mach_time. But you shouldn't need it really, it was only there for precise (nanosecond) timing, and it is only in one of the examples, not in the actual addon. The biggest issue in making it cross platform is in ofxOpenCL.cpp in setupFromOpenGL (sets up the OpenCL context from OpenGL Context). I don't know the windows equivalent for retrieving the OpenGL context. So if you can sort that out it would be great!
|
|
Top
|
|
|
Karel
|
Posted: Thu Dec 17, 2009 7:58 pm |
|
|
| Joined: Mon Nov 16, 2009 3:58 pmPosts: 10 |
|
|
Top
|
|
|
|