Investigate multi-threaded performance in the case of parameter distribution.
|Status:||Backlog||Start date:||06 Apr 2020|
See discussion at
I have created the first pull request for better performance in a multi-thread environment https://github.com/scgmlz/BornAgain/pull/920.
It contains a functional test, some performance improvements, and results of performance measurements.
For the moment, it is not clear what causes huge performance degradation in the "Simple sample, small detector" scenario. It is especially noticeable when we switch from single thread to two threads (see comments to pull request). "Calgrind" and "gperf" in Qt-creator do not show anything suspicious.
My only explanation is that caching of specular coefficients (which can take up to 30% of whole CPU time) is starting to play a dominating role in the case of multiple threads when the sample itself is simple.
Here is the list of possible improvements:
- Make specular coefficients cache a common pool for all threads (with corresponding mutexes everywhere).
Or at least profile 1 .vs. 2 threads with caching disabled.
- Move simulation "normalize" inside the thread.
- Make SimulationElement relying on "const IPixel*" instead of "unique_ptr" to avoid costly IPixel::clone.
The difficulty here is Monte-Carlo integration and existence of SimulationElement::SimulationElement(const SimulationElement &other, double x, double y) constructor.