synthpop 1.9-1
---------------

CHANGES
* Removal of typos from the inference vignette.


synthpop 1.9-0
---------------
NEW FEATURES
* Two new disclosure functions calculate various disclosure measures and their 
  associated  statistics from a set of keys combined to form a quasi-identifier. 
  The basic function disclosure() calculates  measures for one target variable. 
  The other function (multi.disclosure()) calls disclosure() for a set of target
  variables, defaulting to all the variables in a   data set that are not in the
  keys. Both functions allow numerical variables to be categorised into groups
  defined by the supplied parameters. By default numeric variables are treated
  as factors with levels equal to their number of distinct values.
* A new vignette on disclosure gives details and sample code for the
  new disclosure functions.
* The function syn.ipf() now allows grouped margins that include all threeway
  and fourway interactions, as well as twoway.
* The two syn methods that can be made differentially private (DP) (`catall` and
  `ipf` can now use the Gaussian mechanism instead of the Laplace mechanism. They
  each have new parameters (`noise`) to define the type of noise to add and del,
  to define delta for the Gaussian mechanism. Exponential noise can also be added
  but this is not recommended and a warning is printed.
* New function mergelevels() that combines levels of factors in a data.frame 
  using routines from the forcats package.
* New function synorig.compare() that can be used to check synthetic data created
  by methods other than syn() to make sure that the data sets are comparable.
  Some simple fixes can be done and the changed data sets returned.
  It is called in the versions of compare.synds(), utility.gen(), utility.tab(), 
  utility.tables(), disclosure() and multi.disclosure() when these methods are
  for synthetic data supplied as a dataframe or a list of data frames. If 
  fixing is possible this is done and functions continue with fixed data. 
  At present this only happens for first elemement of a list. 

CHANGES
* Function replicated.uniques() changed. Now has additional parameter `keys`,
  a vector of variable names  used to define unique records. As well as checking 
  for replicated uniques it also checks for uniques in the synthetic data
  that are present (but not necessarily  unique in the original). 
  It now returns results as an object of class `repuniq.synds`, that can be
  printed with print.repuniq.synds. Its value is a table of numbers of 
  different types of uniques and two vectors or matrices (if object$m>1)
  that identify the records to be excluded for each type of unique
  that can be excluded with the function sdc().
* Function sdc() is modified to accept input from the new replicated.uniques()
  function. As well as the parameter `rm.replicated.uniques`, there is another
  rm.uniques.in.orig. THis may be a more appropriate strategy for low-fidelity
  synthetic data. (add ref here when I have report published).
* The functions compare.synds() and utility.tables() that calulate utility 
  measures comparing synthetic data to the original, now have a parameter
  `print.flag` (default `TRUE`) that prints out a message as each element of the
  output is calculated.

BUG FIXES
* In syn at the check for collinearity that removes some predictors, predictors
  that have a passive method are not removed.
* The function utility.tables() now has an explicit parameter `max.table` that is 
  used by utility.tab() to override the default for the maximum table size for 
  any set of tables to prevent the plots from failing.


synthpop 1.8-0
---------------

NEW FEATURES
* An option in compare.synds() to add values of a selected utility measure 
  to names of a plot facets (`utility.to.plot` argument).
* Synthesis with "catall" and "ipf" has new features that permit the creation 
  of differentialy private (DP) synthetic data. Parameters are `epsilon` and 
  `rand`.
* New smoothing methods (`smoothing` argument): "spline" and "rmean". Now you 
  can also provide a single string specifying a smoothing method, which will be
  then used to smooth all numeric variables in the data. You can still provide
  a named list to smooth selected variables only.
* New parameters `low`, `high`, `n.breaks`, and `breaks` for utility.tables()
  that allow to create a two color binned gradient and set colours for low and 
  high ends of the gradient.  

CHANGES
* Optional parameters can be set for syn.norm(), syn.lognorm(), syn.sqrtnorm(),
  syn.cubertnorm(), syn.normrank(), syn.pmm().
* syn.cubertnorm() now allows negative values.
* In multi.compare() if `msel = NULL`, pooled synthetic data copies are 
  compared with the original data.  
* Parameter `incomplete` added to summary.fit.synds() function.  
* In syn.ipf() default value of `tol` parameter changed to 1e-3 in order to 
  speed up convergence.
* Sublist elements of structural zero list must be named.
  
BUG FIXES
* In multi.compare() default colour and fill scales restored for ggplot() to
  allow for more than 9 levels/synthetic data sets to be plotted.
* Automatic detection of incomplete synthesis.
* Formatting of labels for ploting numeric variables in compare.synds() that
  could create duplicated values.
* Subsetting of columns in data frames that are also tibbles.  
* Handling of special characters in variable names.
* Labeling of multi.compare() plots.


synthpop 1.7-0
---------------

NEW FEATURES
* New function utility.tables() summarises oneway, two- or three-way tables 
  of utility by utility.tab(). Plots and tables of utility measures can be 
  produced.
* Functions utility.gen(), utility.tab(), and compare.synds() can be used 
  to assess the utility of synthetic data sets that were created NOT using 
  synthpop. They have to be provided as a data.frame or a list.
* utility.gen() and utility.tab() have a parameter `k.syn` to indicate that 
  the sample size itself has been synthesised. The default value is `FALSE` 
  that will apply to synthetic data created by synthpop. 
* The following additional statstics for each synthesis are calculated in 
  utility.tab(): "G" - the adjusted likelihood ratio chi-squared statistic 
  for comparing tables with original and synthesised values, 
  "JSD" - the Jensen-Shannon distance between the tables with original and 
* These additional statistics are computed by utility.tab() and utility.gen(): 
  "PO50" - the percentage over 50% of each synthetic data set where the model 
  used to predict real or synthetic is correct; "SPECKS" the Kolmogorov-Smirnov 
  distance comparing the prpensity scores for the synthetic and original data.
* Parameter `print.flag` is added to utility.gen() and utility.tab() to 
  suppress, if desired, output for simulations.
* compare.synds() produces a table, tab.res, that gives the oneway utility
  statistics computed by utility.tab with parameter tables = "oneway"
* New parameters `plot` and `table` in compare.synds() function to indicate,
  respectively, whether plots should be produced and tables printed. By default
  plots are printed and tables are suppressed. 

CHANGES
* Default `minnumlevels` changed from -1 to 1. This permits numeric variables 
  with a single distinct value as well as missing values to be synthesised 
  correctly.
* In utility.gen() failure to converge is changed from a warning to a failure 
  and message changed.
* Improved error messages in syn() to signal small samples or factor variables
  with too many levels.
* Matrix of predictors x in logreg.syn() is centred and xp made to match to 
  improve comptutation (thanks to Heloise Gauvin for noting this problem).
* Improved error messages and help file for syn.survctree().
* In syn() variables changed to factors from character are changed back to
  character in synthetic data.
* In syn() variables changed to factor from numeric because of only a few 
  levels, are changed back to numeric in synthetic data.
* In utility.tab() default for `digits` changed from 2 to 3.
* Several changes to a utility.gen(), including calculating p-values for 
  resampling methods empirically and the calculation of the percentage 
  correctly predicted by the model. Also changes in the saved object and in 
  the print method.
* Print method for utility.gen() changed to allow user to change 
  settings when called after the objects are created.
* In compare.synds() and multi.compare() numeric variables with fewer than 
  6 distinct values are changed to factors in order to make plots more 
  readable.
* In multi.compare() if barplot.position = "dodge", source of data 
  (original/sythetic) is mapped to an aesthetic `fill`. 

BUG FIXES
* Problems with checks when a variable has a class attribute that is of 
  length > 1 corrected.
* Bug in survctree.syn() corrected.
* Bug in utility.tab() when classInt() returns duplicate values for breaks 
  when `style = "quantile"`, now the default, corrected.
* A synthetic version of a logical variable synthesised using function 
  syn.cart() is also logical.
* Passive functionality has been improved to allow checking if the passive 
  method would give the right answer in the original data and produce a failure
  if it fails.
* Synthesis of variables with assigned labels (note that synthetic values 
  will not have labels).  


synthpop 1.6-0
---------------

NEW FEATURES
* Incomplete synthesis is detected automatically from the details of
  the methods used to synthesise a data set. Functions glm.synds(), lm.synds(), 
  polr.synds() and multinomial.synds() return another component `incomplete` 
  of their result. The function summary.fit.synds() no longer has the parameter 
  `incomplete`. In the previous version when a user fitted a model where some 
  of the variables were not synthesised and `m = 1`, summary.fit.synds() would 
  stop with an error. In the current version calculations are carried out 
  as if all variables had been synthesised (`incomplete` is set to `FALSE`)
  but a warning is printed.
* New function polr.synds() to fit ordered logistic models to synthetic data 
  using the polr() function from the package MASS. The compare.fit.synds() 
  function can be used to compare the results of polr.synds() with those 
  based on the original data.
* New synthesising function syn.ranger() which uses a fast implementation 
  of random forests (contributed by Caspar van Lissa).  

CHANGES
* The vignette on inference has been updated.
* synthpop website address added in a DESCRIPTION file. 
* "cart" default method is explicitly defined in the syn() function.
* Correction of a Value part in the help files for syn.methods.
* Warning when numeric variables with five or fewer levels not changed 
  to factors.

BUG FIXES
* Method "polr" (ordered logistic regression) is not replaced by "polyreg"
  (unordered logistic regression) if not necessary. A bug due to passing of an 
  unnecessary parameter `smoothing` removed. 
* In cart.syn() when a numeric variable is synthesised and the synthetic values 
  of the explanatory variables cannot be classified to a final node of a tree
  they are randomly assigned to one of the final children nodes.
* In glm.synds() when used with `family = gaussian` the variances are rescaled 
  with the residual variance estimate.
* Numeric varaiables with missing values which are not synthesised but used as 
  predictors for other variables are kept unchanged.


synthpop 1.5-1
---------------

BUG FIXES
* Not running an example that mostly fails due to too many small categories in 
  the synthesised data set.


synthpop 1.5-0
---------------

NEW FEATURES
* New methods "catall" and "ipf" that synthesise a group of variables together 
  at the start of the synthesis.
* New parameters `numtocat` and `catgroups` for syn(). Variables in `numtocat` 
  are converted into categorical variables with breaks determined by their 
  distribution and `catgroups` gives the target number of groups for each 
  variable. The data with the categorical versions are then synthesised and 
  finally the synthesised variables in `numtocat` are created from bootstrap 
  samples within the categories. This feature was developed for the  second 
  stage of "ipf" and "catall" but can be used with any method. Variables
  in `numtocat` must have a method suitable for categorical data.
* New function numtocat.syn() can be used to group numeric variables into 
  categories. It can be used before synthesis to create a new data frame that 
  can then be synthesised. It should be used instead of the `numtocat` parameter 
  of syn() if you want to keep the categorical variables in the synthetic data.
* The function nested.syn() can now be used with continuous variables nested
  within categories. It also allows smoothing of the non-missing data.
* New function codebook.syn() that can be used to check features of data 
  before synthesis.
* New parameter to compare.synds() `stat` with possible values "percents" or 
  "counts" allows tables and plots to display counts instead of percentages in 
  groups.

CHANGES
* Some improvements to utility.tab(): 
  - default for `print.tables` is to print if up to a 3-way table;
  - a new parameter `useNA` to include or exclude NA values from tables;
  - a new parameter `print.stats` to allow a choice of what statistics to be 
    printed and the default is to print only the Voas-Williamson statistic with 
    a simpler format.
* Format of printed output from utility.gen() has been changed to emphasise the 
  ratio to the expected as the most important measure.
* `mincriterion` for syn.ctree() changed to `0.9`.
* If the syn() parameter `models = TRUE`, models are stored for the variables
  in the visit sequence and for missing values in the continous variables.

BUG FIXES
* utility.tab() and utility.gen() for synthetic data generated from syn.strata().
* Smoothing for "sample" method.


synthpop 1.4-4
---------------

BUG FIXES
* Compatibility with the new version of ggplot2 (2.3.0).
* Check in multi.compare() for missing argument `var`.
* Check in sdc() for data type of variables to be smoothed.
* Size of $syn when k < n and all synthesising methods without predictors.


synthpop 1.4-3
---------------

CHANGES
* In padModel.syn() the default for newly defined synthesising methods is 
  NOT to create dummy variables from factors before fitting models.

BUG FIXES
* Extraction of variable name from a tranformed dependant variable in 
  glm.synds(), lm.synds() and multinom.synds().
* In syn.polyreg() numerical variables used as predictors for the multinom() 
  function are scaled so as to have a range (0,1) to improve convergence. An 
  extra warning is included if all the preicted values are in one category, 
  which can result when the model fails to iterate due to sparse data.


synthpop 1.4-1
---------------

NEW FEATURES
* New version of utility.gen() and print.utility.gen(). These now implement 
  the methods described in Snoke et. al. and include the following:
  - appropriate method of getting the null distribution is selected based on 
    synthesis details;
  - warning messages if CART models fail to split;
  - allow a seed value to be set and stored for resampling methods;
  - printing Z scores from logit models using the average fit over different 
    syntheses.
* New function multinom.synds() to fit multinomial models to synthetic data 
  using the multinom() function from the package nnet.

CHANGES
* Default `cp` parameter in syn.cart() changed to 1e-8.
* Default `print.tables` parameter in utility.tab() changed to `TRUE`.
* All syn() result components that are lists, e.g. `cont.na`, have their
  elements named.
  
BUG FIXES
* Call to classIntervals() in utility.tab() uses style = "fisher" as default 
  to avoid problems for variables with a small number of unique values. 
* utility.tab() computes breaks for grouping from the combined data rather 
  than just from the observed. This avoids problems when synthetic values are 
  outside the range of the observed ones.
* In compare.fit.synds() the quantity real.varcov needs to be multiplied by 
  sigma^2 when fitting.function is "lm".


synthpop 1.4-0
---------------

NEW FEATURES
* Vignette on inference from fitted models (we are grateful to Joerg Drechsler 
  for his comments).
* New function syn.satcat allows saturated categorical models to be fitted
  from all possible interactions of the predictor variables.

CHANGES
* utility.tab() returns p-values for utility measures rather than standardised 
  versions of the measures.
* `smooth.vars` parameter added to sdc() function which allows smoothing of
  numeric variables in the synthesised dataset.
* If `models = TRUE` in syn(), for `logreg` and `ployreg` coefficients of the 
  fitted model are returned. 
* Output from print.summary.fit.synds() is labelled differently according to
  whether `population.inference` is TRUE or FALSE (see vignette on inference). 
  Also it now includes p-values and stars as are shown for lm() and glm().
* Warning messages are given in summary.fit.synds() when inference is attempted
  from an invalid model where synthsis has not conditioned on any variables
  that have been left unsynthesised.
* The `msel` parameter of print.summary.fit.synds() prints a table of estimates 
  rather than a lsiting of each fit in detail.
* Several changes in compare.fit.synds() are described in detail in the new 
  vignette on inference ('Inference from fitted models in synthpop'). These 
  include extensions of the lack-of-fit tests and confidence interval overlap 
  measures for different options.

BUG FIXES
* replicated.uniques() for a single variable.
* syn.cart() for logical variables without missings (thanks to bug report 
  by Ruben Arslan).
* syn() can be used without loading the package (thanks to reported issue
  by Ruben Arslan)   
* Missing data factor level <NA> as a stratum in syn.strata()  
* `seed` for syn.strata(). 
* print.fit.synds() for syn.strata() object.


synthpop 1.3-2
---------------

CHANGES
* For consistency reasons tab.utility() changed to utility.tab() and 
  utility.synds() to utility.gen().
* utility.gen() under major revision and temporarily unavailable.  
* Revision of utility.tab() function: a measure of fit proposed by Voas and 
  Williams and one proposed by Freeman and Tukey are calculated; 
  continous variables are categorised using classIntervals() function.
* Revision of compare.fit.synds() function: new lack-of-fit measures and mean 
  values for existing analysis-specific utility measures (confidence interval 
  overlap and absolute standardized difference between coefficient); 
  `return.result` parameter replaced by `print.coef` with slightly different 
  functionality - analysis-specific utility measures are always printed but you
  can choose whether to print or not model estimates. 

BUG FIXES
* compare.synds() for integer variables without missing values (thanks to bug 
  report by Joerg Drechsler).


synthpop 1.3-1
---------------

CHANGES
* Update of the vignette.

BUG FIXES
* compare.synds() for variables with NA values in observed but not in 
  synthetic data returns correct value (0) for NA category in synthetic data.
* Invalid `times` argument corrected (lists of numbers coerced to numbers).


synthpop 1.3-0
---------------

NEW FEATURES
* Storing results of CART models when `models` set to TRUE.
* Function syn.strata() for stratified synthesis.
* Function multi.compare() for multivariate comparison of synthesised and 
  observed data.
* Synthesising method "nested" for a variable nested within another variable.
* Tabular utility function tab.utility() for comparing contingency tables from 
  observed and synthesized data.
* Parameter `uniques.exclude` for the sdc() function, which can be used to 
  remove some variables from the identification of uniques.
* Function replicated.uniques() returns a number of unique individuals in the 
  original data set ($no.uniques).
  
CHANGES
* Synthetic values of collinear variables are derived based on the one that 
  is synthesised first and their method is set to "collinear". They do not 
  have to be removed prior to synthesis. 
* Synthesising method for constant variables is set to "constant" and the 
  variables are not removed from the synthesised data set when 
  `drop.not.used = TRUE`. 
* Default synthesising `method` changed to "cart".
* Default `minnumlevels` changed to -1 (during synthesis numeric variables are 
  not changed to factors regardless of the number of distinct values). 
* Coefficient estimates and their confidence intervals are ploted in the same 
  order as they are presented in a tabular form.   
* No message on the seed value used (it is stored in the result object).
* Formula of the model to be fitted using glm.synds() or lm.synds() can be 
  specified outside the function.
* Message for sdc() on number of replicated uniques also when it is equal to 
  zero.   
* Maximum number of iterations for a multinomial model used in `polyreg` and 
  `polr` method increased to 1000 (`maxit` parameter). Message if the limit is 
  reached.
* write.syn() saves complete synds object into a file synobject_filename.RData. 
* Error on exceeding `maxfaclevels` in not generated if `method` for the factor
  is set to "sample" or "nested".
* For constant variables method is changed to "constant". 
* Year format for variables `ymarr` and `ysepdiv` in SD2011 dataset changed 
  from `yy` to `yyyy`.     

BUG FIXES
* Types and placement of special signs that are allowed in `rules` have been 
  extended and include e.g. initial and closing round bracket.
* compare.synds() provides output for logical variables.
* Synthesis of logical variables with missing values.
* Message about a change of method for a variable without predictors.
* Check for `filetype` in write.syn() 


synthpop 1.2-1
---------------

BUG FIXES
* No calling var(x) on a factor x (in checks).
* No `contrasts` attribute for factors synthesised using parametric method.
* Misspelled vector name (nlevels) replaced with a correct one (nlevel).


synthpop 1.2-0
---------------

NEW FEATURES
* A new function utility.synds() for distributional comparison of synthesised 
  data with the original (observed) data using propensity scores. 
* New measures for comparing model estimates based on synthesised and observed 
  data implemented in compare.fit.synds() function: standardized differences 
  in coefficient values(`coef.diff`) and confidence interval overlap (`ci.overlap`).

CHANGES
* No dependency on `coefplot` package.  
* Default for `drop.not.used` changed to FALSE.


synthpop 1.1-1
---------------

CHANGES
* Both variable names and their column indices can be used in `visit.sequence`.
* Arguments `rules`, `rvalues`, `cont.na`, `semicont`, `smoothing`, `event`,
  `denom` are specified as named lists, e.g. rules = list(marital = "age < 18")
  and do not have to be specified for all variables.
* Optional arguments can be passed to synthesising functions by specifying 
  `funname.argname` arguments, e.g. ctree.minbucket = 5; they are 
  function-specific; `minbucket` removed from arguments.  
* Smoothing is possible for numeric variables when synthesised with the method 
  "sample".
* compare() is a generic function with two methods (for class `synds` and 
  `fit.synds`); it replaced two separate functions.   
* New argument `return.plot` for compare() method for class `fit.synds`.
* New argument `msel` for compare() method for class `synds`, which 
  allows comparison for pooled or selected data set(s). Results for multiple
  synthetic data sets can be plotted on the same graph. 
* New argument `nrow` for compare() method for class `synds`; `nrow`
  and `ncol` determine number of plots per screen.
* Argument `plot.na` for compare() method for class `synds` is no longer 
  required and missing data categories for numeric variables are ploted 
  on the same plot as non-missing values. 
* Argument `object` of lm.synds() and glm.synds() functions changed to `data`. 
* print() method for class `fit.synds` gives by default combined coefficient 
  estimates only.
* summary() method for class `fit.synds` gives combined coefficient 
  estimates and their standard errors.
* summary() method for class `synds` with multiple synthetic data sets 
  provides by default summaries that are calculated by averaging summary 
  values for all synthetic data copies.
* Argument `obs.data` of compare.fit.synds() function changed to `data`. 
* Method `surv.ctree` and `cart.bboot` changed to `survctree` and `cartbboot`.

BUG FIXES
* `denom` and `event` for variables with missing data.
* `maxfaclevels` can be increased.
* Continuous variables with missing data when zero is a non-missing value.
* Synthesis of a single variable (with or without auxiliary predictors) now 
  works.


synthpop 1.1-0
---------------

NEW FEATURES
* Function sdc() for statistical disclosure control of the synthesised data 
  set(s); function replicated.uniques() to determine which unique units in the 
  synthesised data set(s) replicates unique units in the original data set.    
* Function read.obs() to import original data sets form external files.
* Function write.syn() to export synthetic data sets to external files and 
  create a text file with information about the synthesis.
* syn() has new `semicont` parameter that allows to define spike(s) 
  for semi-continuous variables in order to synthesise them separately.
* `lognorm`, `sqrtnorm` and `cubertnorm` methods for synthesis by linear 
  regression after natural logarithm, square root or cube root transformation 
  of a dependent variable.  
* `seed` argument for syn() function.

CHANGES
* Revised output of summary.fit.synds() and compare.fit.synds(); 
  standard errors of Z scores corrected (se(Z.syn)) 
  (thanks to Joerg Drechsler).
* Figures for compare.fit.synds() and compare.synds() functions plotted 
  using ggplot2 functions.  
* period.separated or alllowercase naming convention has been adopted and 
  parameter names `populationInference`, `visitSequence`, `predictorMatrix`,
  `contNA`, `defaultMethod`, `printFlag` and `nlevelmax` have been changed to
  `population.inference`, `visit.sequence`, `predictor.matrix`, `cont.na`,
  `default.method`, `print.flag` and `minnumlevels` respectively.
* Default for drop.pred.only changed to FALSE.

BUG FIXES
* Rounding procedure (thanks to bug report by Joerg Drechsler).
* Warning about extra disregarded argument `family` in compare.fit.synds().