doFORC is a portable (standalone) application working on various operating systems, is made using only free libraries, and it is made freely available to the scientific community.

Even if doFORC is mainly dedicated to FORC diagrams computation, it can smooth and approximate the derivatives of a general set of arbitrarily distributed two-dimensional points.

**The main features of the doFORC tool are:**

- runs naturally on Microsoft Windows operating systems, and via Wine compatibility layer on Linux-like and Macintosh operating systems
- the easy to use graphical user interface (GUI) allows users to import the input data, set the fitting parameters, to graphical represent (2D, 3D, and projection) both input and output data, and to export the graphs to image files
- the implemented algorithms and GUI are implemented in Fortran. The graphical interface is realized using the DISLIN library. The fact that both the algorithms and the interface are implemented in Fortran allows a direct interaction between them, without the need to use temporary working files saved on the computer.
- allows the choice of one of the four implemented nonparametric regression procedures: LOESS and three modified Shepard methods further modified for noisy data. The procedures allow great flexibility because no assumptions about the parametric form of the regression surface are needed. Thus the users can try different methods using their data and select one (or more) that is best suited to their needs.
- allows the use of different kernel functions
- input data may have various formats, including the PMC MicroMag format for which the drift correction can be performed
- allows the use of user weights associated with each data point, weights that indicates the precision of the information contained in the associated observation
- removes points that are closer than some tolerance (duplicate or nearby points) from the input data
- the input data can be cropped to ignore certain parts of the input points
- allows the use of a scale factor for input data to change the shape of the neighborhood, considering the points lying on an ellipse centered at the given point to be equidistant from the given point. This feature is useful when the variables have different scales.
- allows the standardization of the input data to change the shape of the neighborhood. This feature is useful when the variables have significantly different scales. The standardization is accomplished using Winsorized mean and standard deviation of each variable. Winsorized values are robust scale estimators in that extreme values of a variable are discarded (the smallest and largest 5% of the data) before estimating the data scaling.
- robust smoothing allows to minimize the influence of outlying data (outliers) allowing the user to choose different robustness weight functions
- output data can be provided at the user's choice in:
- input points
- a regular grid in the $\left( {h_{{\mathrm{applied}}} \ge h_{{\mathrm{reversal}}},\; h_{{\mathrm{reversal}}}} \right)$ half-plane
- a regular grid in the $\left( {h_{{\rm{coercive}}} \ge 0,\; h_{{\rm{interaction}}}} \right)$ half-plane
- a general rectangular regular grid
- user points

- output consist in:
- predicted smoothed values in the points from the input file, along with other information (residuals, smoothed residuals, histogram of the residuals, influences, confidence intervals for fit)
- requested derivatives in the output points
- requested statistics from (RSS, RSSm, RSE, DF1, DF2, DF3, $\delta _1$, $\delta _2$, $\rho$, AICC, AICC1, GCV)

- allows the visual analysis of the agreement between the input data and the smoothed values, and we note that this is a mandatory step, as it allows the identification of the underfitting and overfitting situations. If the two sets of data differ significantly or if the smoothed data follows too closely a noisy input, then the errors in estimating the derivatives may be quite large.
- performs statistical inference (deduces properties of data sets from a set of observations and hypotheses) provided that the error distribution satisfies some basic assumptions
- in order to perform the diagnostics and goodness of fit doFORC compute the residuals to characterize the difference between the actual observed value and the predicted value, generalized cross-validation GCV to measure the predictive performance of the model, two information criteria AICC and AICC1 to quantify the information that is lost by using an approximate model on the available data, and three degrees of freedom DF1, DF2, DF3 to compare different amounts of smoothing being performed by different smoothing methods.
- based on the above criteria doFORC can perform automatic smoothing parameter selection. Although the default method for selecting the smoothing parameter value is often satisfactory, it is often a good practice to examine how the fit varies with the smoothing parameter. In some cases, fits with different smoothing parameters might reveal important features of the data that cannot be discerned by looking at a fit with just a single "best" smoothing parameter.
- there are several ways in which user ca control the sequence of fitting parameters (number of neighbors $nn$) examined:
- specifying a list of $nn$ values
- if no criterion is specified then a separate fit is provided for each $nn$ value
- if a criterion is specified then all values specified in $nn$ list are examined and the value that minimizes the specified criterion is selected

- specifying a range $\left( {lower,upper} \right)$ of $nn$ values examined, for which the golden section search method is used to find a local minimum of the specified criterion in the given range

- specifying a list of $nn$ values
- provides a graphical illustration of the $nn$ nearest neighbors for each input point
- provides test problems that consist of sets of data obtained using various known functions, over which a known normal (Gaussian) noise and a certain percentage of outliers are added. These test problems allow users to see the limits of each method, to observe any numerical artifacts. The test problems can also be used to test, asses the accuracy, and validate other FORC type (or two dimensional smoothing) software tools that exist in the scientific literature.

## Main doFORC Interface

The main doFORC interface allows to:- Configure the parameters for computation.
- Process the input data.
- [
*Optional*] Graphically represent the input and output data, as well as statistics.

- the selected data can be represented as:
- 2D plots of the selected $y$ data versus the selected $x$ data

- if several curves are represented simultaneously then each of them is drawn with different colors
- if only the residuals are selected as $y$ data then the histogram of the residuals can be represented using "Residuals histogram"

- 3D shaded surface of the $z$ triangulated data
- xy projection of the $z$ triangulated data onto the coordinate plane $\left(x, y \right)$
- the triangles are ploted with interpolated colors
- contours can be added using "contours"
- xy projection scattered of the $z$ surface onto the coordinate plane $\left(x, y \right)$

- each data point is plotted as a colored rectangle where the $x$ and $y$ coordinates determine the position of the rectangle and the $z$ coordinate deﬁnes the color
- the size of the rectangles can be set with "Marker size"
- only the points from the selected data are plotted, without the addition of any other points to fill any possible empty spaces or to create the visual sensation of continuity. Adding new points (by interpolation for example) could make it difficult to identify possible local numerical artifacts.

- 2D plots of the selected $y$ data versus the selected $x$ data
- if the imput data or the output data obtained with
*output_points = user_points*option are represented as a 2D plot (with only one $y$ selected column) or as a xy scattered projection, then the nearest neighbors for each input point can be represented graphically using "Show neighbors"

- the selected data can be represented as:
- [
*Optional*] Export graphs as*bmp, eps, gif, pdf, png, svg*or*tiff*files in the working directory.

- both
*gif*and*png*are exported with transparent background *gif*is saved using a 8-bit color palette- all other formats are saved using a 24-bit color palette (true color)

- both

## Configuration Interfaces

The parameters needed for data processing can be set by one of the two methods:- through a configuration file that is read in the configuration interface using "Browse config file"

*optionally*after reading the parameter values can be changed by user using the interface- an erroneous value in the configuration file is replaced with the corresponding default value (except the name of the input file)

- by setting the parameters in the configuration interface

*optionally*the new parameters can be saved in a new configuration file using "Save as"

- saving is not possible if errors are present
- the selected parameter values can be further used for data processing even if they are not saved in a configuration file

**mandatory**the user must verify the correctness of the parameter values using "Check config data"- the verification is also made by "Save as" and "Plot input data"
- any error or warning is signaled both in the interface and the command prompt
- if the interface is closed before all errors are solved, then data processing is not possible

- blank lines and comments beginning with an exclamation point (!) are ignored
- lines that do not comply with the 'keyword = value' format are ignored
- line terminator (the character or sequence of characters that marks the end of a line of text) can be CR (usually Macintosh files), LF (usually Unix files), or CRLF (usually Windows files). All lines in a given file must have the same terminator.
- the configuration files can also be created and/or modified using a plain text editor

The configuration file provided as an example contains extensive documentation in the form of comments.

## Lite Configuration Interface

- contains only the mandatory parameters
- the optional parameters are set to the default values
- lite configuration interface can be used to:

- read a configuration file, reading only the mandatory parameters
- set new parameters
- create a new configuration file

- input data are graphically represented as:
- a plot of the magnetization vs. the applied field $\left( z \;\mathrm{vs.}\; x \right)$
- a projection of the magnetization surface onto the coordinate planes $\left( z \;\mathrm{vs.}\; x \;\mathrm{and}\; y \right)$
- each data point is plotted as a colored rectangle where the $x$ and $y$ coordinates determine the position of the rectangle and the $z$ coordinate deﬁnes the color
- the size of the rectangles can be set with "Marker size"

- values entered by the user in the editable field are not read until the Enter key is pressed
- "Save as," "Check config data," and "Plot input data" read all the displayed fields
- "New config file" resets all the parameters to their default values

## Main Configuration Interface

- similar to
**Lite Configuration Interface**, but additionally contains the optional parameters - initially all parameters are set to their default values
- the displayed fields may change depending on the value selected from a list
- the values entered by the user in the editable fields are not read until the Enter key is pressed

## Plot details

The Plot details interface allows to:- edit the axes titles
- set the position and the number of the ticks on the axes
- add text on the graphs
- the text characters are interpreted using LaTeX markup, allowing to include Greek letters, special characters (such as integral and summation symbols), superscripts, subscripts, to modify the text type and color.

## Command Prompt

The command prompt contains both the messages displayed in GUI and others in addition.The error or warning messages from the numerical subroutines are displayed only in the command prompt.

## Files and data saved by doFORC

doFORC save the processed data into new files whose names are obtained from the input_file by appending new strings before the extension. If files with the same name already exist, then they will be overwritten.*input_file*__**smoothed_input**contains the smoothed values in the points from the input file, along with other information, according to the table below:Column Description Restriction h_app applied field (column $x$) N/A h_rev reversal field (column $y$) N/A m magnetic moment (column $z$) N/A diagL $\text{diag} \left( L \right) \equiv$ influence ihat ≥ 1 or istat ≥ 1 confi confidence intervals ihat = 2 or istat = 2 m_fit smoothed magnetic moment N/A res residuals N/A res_fit smoothed residuals smoothresidual = true *input_file*__**smoothed_output**contains the requested derivatives in the output_points saved as columns $\left( x,y,z \right)$*input_file*__**smoothed_output_matrix***_order_of_derivative_nn*contains the requested derivatives in the output_points saved as matrices, one file for each value of*order_of_derivative*and/or*nn*Restriction: only for output_points = ha_hr_regular_grid, hc_hu_regular_grid, or rectangular_grid

*input_file*__**statistics**contains the requested statistics according to ihat or istatIf nn_list = true and a CRITERION is specified, or if nn_range = true, then the statistics for all examined nn values are saved, and the selected value that minimizes the specified CRITERION is on the last line.

## Test Functions Generator

The pseudorandom number generator is initialized by the $seed$ number, so that each value of the seed generates other realization of the true values for the selected function:- if $seed=0$ then the program generates a different pseudorandom sequence each time it runs, using the system clock
- if $seed\neq 0$ then each value of the $seed$ generates other sequence of random numbers, the same at every run

## Color maps used by doFORC

## Screenshots

2D plot of both the input data and the smoothed data.

**The visual analysis of the agreement between the input data and the smoothed values is a mandatory step, as it allows the identiﬁcation of the underﬁtting and overﬁtting situations.**

If the imput data or the output data obtained with *output_points = user_points* option are represented as a 2D plot with only one $y$ selected column, then the nearest neighbors for each input point can be represented graphically using "Show neighbors"

If only the residuals are selected as $y$ data then the histogram of the residuals can be represented using "Residuals histogram"

3D shaded surface of the $z$ triangulated data

xy projection of the $z$ triangulated data onto the coordinate plane $\left(x, y \right)$

- the triangles are ploted with interpolated colors
- contours can be added using "contours"

xy projection scattered of the $z$ surface onto the coordinate plane $\left(x, y \right)$

- the size of the rectangles can be set with "Marker size"
- if the imput data or the output data obtained with
*output_points = user_points*option are represented, then the nearest neighbors for each input point can be represented graphically using "Show neighbors"

The Plot details interface allows to edit the axes titles, to set the position and the number of the ticks on the axes, and to add text on the graphs

The text characters are interpreted using LaTeX markup, allowing to include Greek letters, special characters (such as integral and summation symbols), superscripts, subscripts, to modify the text type and color.