| Author |
Message |
|
|
jroge
|
Posted: Wed Dec 03, 2008 10:41 pm |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
anyone here using openMP with OF?
i tried it today and didn't get very far. the main reason i guess were some incompatibilities with normal threading.
i'd be interested in other experiences because i'd like to switch from pthreads to openMP if that's worth the effort.
best
joerg
|
|
Top
|
|
|
hahakid
|
Posted: Thu Dec 04, 2008 10:34 am |
|
|
| Joined: Wed Dec 12, 2007 10:12 pmPosts: 117Location: London |
|
|
Top
|
|
|
jroge
|
Posted: Thu Dec 11, 2008 10:10 pm |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
today i was successful in using openMP. i just converted my multithreaded app to a singlethreaded one. then parallelized the for loops that do most of the processing. and it worked. much better than the multithreaded one. it distributes the load equally over the 8 cores of the mac pro.
the app processes the feeds of three webcams and does some heavy real time image manipulation with openCV. surprisingly it's much faster to process the data of the three cameras sequentially and have just some parallel loops.
and once i got openMP to work it was just half an hour. just adding some
Code: #pragma omp parallel for
here and there and checking the performance improvement. i had to remove one or two #pragmas because they slowed down the program. apparently more overhead than performance gain.
to use openMP (on osx) you need XCode 3.1 and change to gcc version 4.2 (in the project settings), tick the use openMP checkbox and add some #pragmas. that's it.
best
joerg
|
|
Top
|
|
|
memo
|
Posted: Fri Dec 12, 2008 1:26 am |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
wow thats awesome! so openmp parallelizes for loops? how does that work? do you write a kernel routine and send that to openmp or something instead of a for loop? do you need mutex's or anything?
|
|
Top
|
|
|
jroge
|
Posted: Fri Dec 12, 2008 9:06 am |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
no, it's much simpler. you just add a #pragma statement and as long as your loops stick to some simple rules (no data dependencies between loop steps for example) the get parallelized without any additional code. like in this example:
Code: #define N 100000 int main(int argc, char *argv[]) { int i, a[N]; #pragma omp parallel for for (i=0;i<N;i++) a[i]= 2*i; return 0; }
the Code: #pragma omp parallel for is the openMP statement.
it's really easy and fun.
there's some additional support for mutexes and other stuff but i didn't need that.
best
joerg
|
|
Top
|
|
|
memo
|
Posted: Fri Dec 12, 2008 11:17 am |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
wow thats amazing, does the pragma apply to the very next line? I'm wondering whether its possible to have it apply to the for loops in an external library, like opencv...
|
|
Top
|
|
|
jroge
|
Posted: Fri Dec 12, 2008 11:18 am |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
you can configure openCV for openMP. i think there's a switch for ./configure.
the only problem with openMP is that i was not able to use it with pthreads.
best
joerg[/code]
|
|
Top
|
|
|
memo
|
Posted: Fri Dec 12, 2008 11:29 am |
|
|
Joined: Tue May 27, 2008 10:03 amPosts: 691Location: London, UK |
sounds great, will check it out thanks...
|
|
Top
|
|
|
moka
|
Posted: Sun Jan 04, 2009 1:55 pm |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
Hi,
I am also really interested to get my for loops parrallelized. I did what you said but when I change the GCC version to 4.2 it throws some errors:
8 of these: /Developer/SDKs/MacOSX10.4u.sdk/usr/include/stdarg.h:4:25: error: stdarg.h: No such file or directory
if I only toggle use openMP and stick to GCC version 4.0 I don't get the errors but the speed neither seems to improve.-
Any ideas?
Thanks!
EDIT:
Okay, I read a little bit about all that stuff. As far as I can tell you need GCC 4.2 to make use of OpenMP.
GCC 4.2 only works with the Base SDK for Mac OSX 10.5. If I select that with my OF project it throws a whole bunch of errors since OF does not really like that SDK as far as I can tell.
If I make an emtpy cocoa project everything works fine.
So how did you manage to do that? If it is too difficult would you mind uploading an empty xcode project using of and openMP?
Thank you!
|
|
Top
|
|
|
jroge
|
Posted: Sun Jan 04, 2009 9:42 pm |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
|
|
Top
|
|
|
moka
|
Posted: Sun Jan 04, 2009 11:49 pm |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
|
|
Top
|
|
|
jroge
|
Posted: Mon Jan 05, 2009 7:48 am |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
yes it's with poco (which is causing most of the trouble). and i also use the 10.5 sdk.
best
joerg
|
|
Top
|
|
|
moka
|
Posted: Mon Jan 05, 2009 11:42 am |
|
|
| Joined: Mon Jun 02, 2008 8:24 pmPosts: 409Location: Kiel - Germany |
Wow thank you, I got it to work!
What an incredible change in Performance. Even though I am on a first generation MBP (I think they have 2 cores, not sure though) I get almost big speed improvement for this simple loop:
Code: //#pragma omp parallel for for(int i=0; i<1000000; i++){ }
approx. 410 fps with using openMP
and 270 without it.
I am seriously impressed with this
|
|
Top
|
|
|
joshuajnoble
|
Posted: Mon Jan 05, 2009 3:10 pm |
|
|
Joined: Thu May 31, 2007 2:32 pmPosts: 292Location: PDX |
Wow. That's super impressive. So, excuse my naivety on these matters, but OpenMP is just being used to parallelize for loops that are doing the image processing for 3 simultaneous camera feeds and there's no shared memory problems with that? Did you have to do anything special to get the camera data into a parallelizable state that's markedly different than what you would do for a multithreaded app?
|
|
Top
|
|
|
jroge
|
Posted: Mon Jan 05, 2009 4:59 pm |
|
|
Joined: Fri May 04, 2007 11:43 amPosts: 157 |
the fun thing is: i don't need to rewrite the program much other than adding the pragmas and take care so that the for loops don't have data dependencies in themselves. all the rest is sequential. so the program processes the data of the three cameras after each other. only the processing loops are parallel.
best
joerg
|
|
Top
|
|
|
|