The main ideas for the answer are:
(1) Collapse the target classes to just two classes, namely, presence or absence of heart disease. As seen at, "Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0)." So, collapse target values 1,2,3,4 to just "1". (2) Categorical variables should be properly encoded for use with Neural Network classifiers. For example, use one hot encoding on the categorical variables. Again, info at indicates which variables are categorical. (3) This is a small dataset, so the choice of the validation and test data will affect the bias and variance of the observed accuracy. Using k-fold cross-validation, it is easy to observe accuracies over 80% for the two-class problem of presence vs absence of heart disease.
Implementation with (1) fitcnet & Classification Learner app, OR (2) patternnet
(1) fitcnet and Classification Learner app
Let's first try easy comparison of multiple machine learning models using Classification Learner. First, prepare the data for loading into the Classification Learner app. This little script starts from the data you attached, adds variable names and categorical variable designation, imputes missing values using the mode, and collapses the target to just two classes.
data = readtable('processed.cleveland.csv');
data.Properties.VariableNames = {'age', 'sex', 'cp', 'trestbps', 'chol', ...
'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'};
iscat=[0 1 1 0 0 1 1 0 1 0 1 0 1 0];
data.(i) = categorical(data.(i));
if (sum(ismissing(data.(i))))
data(:,i) = fillmissing(data(:,i),'constant',table2array(mode(data(:,i))));
data.(14) = categorical( double( data.(14)>=1 ) );
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal num
___ ___ __ ________ ____ ___ _______ _______ _____ _______ _____ __ ____ ___
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 1
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
37 1 3 130 250 0 0 187 0 3.5 3 0 3 0
41 0 2 130 204 0 2 172 0 1.4 1 0 3 0
56 1 2 120 236 0 0 178 0 0.8 1 0 3 0
62 0 4 140 268 0 2 160 0 3.6 3 2 3 1
57 0 4 120 354 0 0 163 1 0.6 1 0 3 0
Now, load the data into the Classification Learner app, and choose 10-fold cross-validation:
Once in the app, choose "All" models, "Optimizable Neural Network", and "Optimizable Ensemble" from the models gallery. After training those models, the following results (or similar) are obtained, with many models achieving over 80% accuracy. The exact results will vary, depending on the cross-validation partition and optimization results. The Optimizable Neural Network achieves 85.5% accuracy on the validation data (this is 10-fold cross-validation accuracy). In this case, it turns out that the optimization process chose a neural network with just one layer, and one node in that layer. So, a very simple neural network can do pretty well for this data.

Next, export the best performing neural network to the workspace, using the "Export Model" option. Inside the model, the expanded predictor names can be seen (look at trainedModel.ClassificationNeuralNetwork.ExpandedPredictorNames) indicating that fitcnet has automatically done the one hot encoding, based on which variables are categorical.
>> trainedModel.ClassificationNeuralNetwork.ExpandedPredictorNames
ans =
1×25 cell array
Columns 1 through 8
{'age'} {'sex == 0'} {'sex == 1'} {'cp == 1'} {'cp == 2'} {'cp == 3'} {'cp == 4'} {'trestbps'}
Columns 9 through 15
{'chol'} {'fbs == 0'} {'fbs == 1'} {'restecg == 0'} {'restecg == 1'} {'restecg == 2'} {'thalach'}
Columns 16 through 22
{'exang == 0'} {'exang == 1'} {'oldpeak'} {'slope == 1'} {'slope == 2'} {'slope == 3'} {'ca'}
Columns 23 through 25
{'thal == 3'} {'thal == 6'} {'thal == 7'}
(2) Patternnet
Similar results can be obtained using patternnet, but there will be some differences from fitcnet due to the different training algorithm. Remember to collapse the target classes to just two, and one-hot encode the categorical variables. Also, given the train/validation/test split used by patternnet training, one will generally be looking at the test accuracy, which is roughly similar to looking at the accuracy of one fold in a cross-validation scheme. Due to the smaller sample size, the per-fold accuracy will have much higher variance than the k-fold cross-validation accuracy which is averaged across all folds. I observed "per-fold" test accuracies ranging from a low around 73% to a high around 90%, with the average around 82-83% using a simple patternnet and the default training algorithm (no hyperparameter optimization). In the fitcnet case above, the app doesn't report the per-fold validation accuracy, but the per-fold accuracy will similarly be in a relatively wide range, with the average across all folds being around 85% after hyperparameter optimization (we observed 85.5% above) for a simple neural network.
