If there is only one converged run, the `sivs` will break

There is one very rare condition that `sivs()` does not catch and it will break without a proper error. In this post/issue I'm going to first dissect the way this issue can happen, and then I will list how this can be addressed, and ultimately I elaborate on my preference on how this should be addressed.

The reasons I'm writing this are:
- be transparent
- give the possibility to the users and community to engage and express their opinion, and to discuss solutions
- have a good and comprehensive documentation on this extremely rare issue (one single commit would not do justice to the amount of time I put into investigating this issue. I only received an email with an error and I didn't have access to the input data to investigate it, so I had to go through many hypotheses and run part of the code in my head with imaginary data to test those hypotheses).

## Dissection of the problem

In the following line we are removing the runs that have failed to converge (due to whatever reason that is relevant to the chosen method)

https://github.com/mmahmoudian/sivs/blob/4a57625f2e430e7cb785dec5b9f9bd8b3a75c41d/R/sivs.R#L1010-L1011

and in the following lines we are making sure that we have at least something to proceed with:

https://github.com/mmahmoudian/sivs/blob/4a57625f2e430e7cb785dec5b9f9bd8b3a75c41d/R/sivs.R#L1013-L1019

and if we have at least one  good run, we proceed with this beautiful and compact part to extract the coefficients in a data.frame format:

https://github.com/mmahmoudian/sivs/blob/4a57625f2e430e7cb785dec5b9f9bd8b3a75c41d/R/sivs.R#L1021-L1030

The issue raises when `clean.iterative.res` has length of 1, which subsequently would cause the `coef.df` to be a data.frame with only two columns. When we move the row names to row.names, the data.frame would be left with only one column which consequently turn into a vector:

https://github.com/mmahmoudian/sivs/blob/4a57625f2e430e7cb785dec5b9f9bd8b3a75c41d/R/sivs.R#L1032-L1034

This can be reproduced with this small demo:

```r
# only keeping two columns
a <- iris[, 1:2]
# convert the first column to some unique character values
a[, 1] <- paste0("a", 1:nrow(a))
# the following two lines is exactly what we do in the code chuck above (sivs.R#L1032-L1034)
row.names(a) <- a[, 1]
a <- a[, -1]
```

```console
> a
  [1] 3.5 3.0 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 3.7 3.4 3.0 3.0 4.0 4.4 3.9 3.5 3.8 3.8 3.4 3.7 3.6 3.3 3.4 3.0 3.4 3.5 3.4 3.2 3.1 3.4 4.1 4.2 3.1
 [36] 3.2 3.5 3.6 3.0 3.4 3.5 2.3 3.2 3.5 3.8 3.0 3.8 3.2 3.7 3.3 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 2.0 3.0 2.2 2.9 2.9 3.1 3.0 2.7 2.2 2.5
 [71] 3.2 2.8 2.5 2.8 2.9 3.0 2.8 3.0 2.9 2.6 2.4 2.4 2.7 2.7 3.0 3.4 3.1 2.3 3.0 2.5 2.6 3.0 2.6 2.3 2.7 3.0 2.9 2.9 2.5 2.8 3.3 2.7 3.0 2.9 3.0
[106] 3.0 2.5 2.9 2.5 3.6 3.2 2.7 3.0 2.5 2.8 3.2 3.0 3.8 2.6 2.2 3.2 2.8 2.8 2.7 3.3 3.2 2.8 3.0 2.8 3.0 2.8 3.8 2.8 2.8 2.6 3.0 3.4 3.1 3.0 3.1
[141] 3.1 3.1 2.7 3.2 3.3 3.0 2.5 3.0 3.4 3.0
```

And this is the problem, because the following `apply()` expects a tabular object but what it will receive is just a vector:

https://github.com/mmahmoudian/sivs/blob/4a57625f2e430e7cb785dec5b9f9bd8b3a75c41d/R/sivs.R#L1036-L1039



-------

## Possible solutions and my thoughts

There are many ways to solve this programmatically, like converting the `coef.df` to a matrix or even using the tibble package for all tabular data we have in the SIVS package; but the question is more fundamental imho (when `nfolds > 1`) :
- If we end up with one single "converged" run, should we even consider moving forward?
- Is this one single run a reliable run? (because it has worked in only one particular sample configuration)
- Should we do the Recursive Feature Elimination step using this one "lucky" run?

I would argue that the RFE should not be ran, because having one single lucky run is well against having a reproducible research and what "Stable Iterative Variable Selection" is all about.

If we decide not to move to the next step in SIVS (i.e recursive feature elimination), then the question would be on how this should be reported to user and what should we return? Possible options are
1. Throw an error and not returning anything to user (even that one lucky run)
2. Throw a warning and return whatever we have so far

I would argue that we should throw an error and do not return anything to user. This is because, in spite of writing it in the article that SIVS should not be treated as a black-box method, many naively would still run the code and expect magic, and in the off chance of hitting this "lucky" run, they will just go with what SIVS is providing and they will move forward without thinking twice about what has happened and why. Perhaps when #1 is implemented, the user can have access to the list of coefficients and can do some sort of RFE themselves at their own risk if they want to.

## Reports

This issue has been reported on Stackoverflow as well:

- https://stackoverflow.com/questions/72056255/sivs-r-package-error-in-applycoef-df-1-functionf-dimx-must-have-a-pos/

## Acknowledgment

I would like to thank [Alexander Biehl](https://orcid.org/0000-0002-8558-576X) who reported this issue. <s>I will edit this post when/if I get their consent to put their real name or GithubID here.</s> As I mentioned, this is a very rare case and Alex responsibly and kindly reported the issue to me via email. The time he invested in writing that email should also be factored in for resolution of this issue. Also Alex pointed me to the Stackoverflow post which I mentioned above.

	# if there were not any runs that have converged
	if(length(clean.iterative.res) == 0){
	stop("In the Iterative step, none of the ", length(iterative.res),
	" runs got converged. This means that sivs cannot progress to",
	" subsequent steps, and you should use different",
	" 'method' or normalize your data differently.")
	}

	coef.df <- Reduce(function(...){ merge(...,
	by = "names",
	all = TRUE) },
	sapply(names(clean.iterative.res),
	FUN = function(item) {
	temp <- clean.iterative.res[[item]]$coef
	colnames(temp)[2] <- paste0("coef.", item)
	return(temp)
	},
	simplify = FALSE))

	# calculate how many times each feature has been given a coefficient other than 0 and them sort them decreasingly
	selection.freq <- sort(x = apply(coef.df, 1, function(f){
	sum(f != 0)
	}), decreasing = TRUE)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If there is only one converged run, the `sivs` will break #2

Dissection of the problem

Possible solutions and my thoughts

Reports

Acknowledgment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	# remove the runs that could not converge (and as the result have NA instead of coefficients' dataframe)
	clean.iterative.res <- iterative.res[!sapply(lapply(iterative.res, "[[", "coef"), is.logical)]

	## set the feature names to be the rownames
	row.names(coef.df) <- coef.df[, 1]
	coef.df <- coef.df[, -1]

If there is only one converged run, the sivs will break #2

Description

Dissection of the problem

Possible solutions and my thoughts

Reports

Acknowledgment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

If there is only one converged run, the `sivs` will break #2