Feature #2433

Investigate multi-threaded performance in the case of parameter distribution.

Added by pospelov 4 months ago. Updated 2 months ago.

Status:BacklogStart date:06 Apr 2020
Priority:NormalDue date:
Assignee:-% Done:


Target version:-



#1 Updated by pospelov 4 months ago

  • Assignee set to pospelov

#2 Updated by pospelov 4 months ago

I have created the first pull request for better performance in a multi-thread environment https://github.com/scgmlz/BornAgain/pull/920.
It contains a functional test, some performance improvements, and results of performance measurements.

For the moment, it is not clear what causes huge performance degradation in the "Simple sample, small detector" scenario. It is especially noticeable when we switch from single thread to two threads (see comments to pull request). "Calgrind" and "gperf" in Qt-creator do not show anything suspicious.

My only explanation is that caching of specular coefficients (which can take up to 30% of whole CPU time) is starting to play a dominating role in the case of multiple threads when the sample itself is simple.

Here is the list of possible improvements:

  • Make specular coefficients cache a common pool for all threads (with corresponding mutexes everywhere).

Or at least profile 1 .vs. 2 threads with caching disabled.

  • Move simulation "normalize" inside the thread.
  • Make SimulationElement relying on "const IPixel*" instead of "unique_ptr" to avoid costly IPixel::clone.

The difficulty here is Monte-Carlo integration and existence of SimulationElement::SimulationElement(const SimulationElement &other, double x, double y) constructor.

#3 Updated by pospelov 2 months ago

  • Status changed from Sprint to Backlog
  • Assignee deleted (pospelov)
  • Target version deleted (Sprint 43)

Also available in: Atom PDF