Last Updated: February 25, 2016
·
652
· dionysios

Splitting Vectors

Often, we need to split a vector into smaller parts of more or less equal size and subsequently combine different parts into one subset vector. In the example below, the leave-one-out method is implemented:

set.seed(42);                             #Same result every time
inputVector = rnorm(22);         
numberOfGroups = 4;
numberOfElements = floor(length(inputVector) / numberOfGroups);
groups = split(inputVector, sample(rep(1:numberOfGroups, numberOfElements)));

for(leaveOutGroup in seq(numberOfGroups)){
    trainingPoints = unlist(groups[-leaveOutGroup], use.names = FALSE);
    testPoints = unlist(groups[leaveOutGroup], use.names = FALSE);
    #...
}

Note that the length of the initial input vector does not have to be a multiple of the number of groups. In the example above, two of the four groups will be of length six, and R will give a warning, which can be suppressed by:

options(warn=-1)

If the data consists of more than one vectors, we can split the indices with the same technique, and then use those to pick the respective entries from the vectors of interest:

options(warn = -1);
set.seed(42);

inputVector  = rnorm(22);
outputVector = rnorm(length(inputVector));
indexVector  = seq(length(inputVector));

numberOfGroups = 4;
numberOfElements = floor(length(inputVector) / numberOfGroups);
indexGroups = split(indexVector, sample(rep(1:numberOfGroups, numberOfElements)));

for(leaveOutGroup in seq(numberOfGroups)){
    training = unlist(indexGroups[-leaveOutGroup], use.names = FALSE);
    test     = unlist(indexGroups[leaveOutGroup], use.names = FALSE);
    #...    
}

The R function sample does the random sampling and we collapse the factor object that contains the groups using unlist , with use.names set to false.