How to split a column's elements to two vectors based on lables?

7 visualizaciones (últimos 30 días)
I attached a part of lung dataset(32X57), It's last column is the lables(1 or 2), I want to split each column to two vectors based on the lables:
F(i).normal vector for saving matrix's elements with lable 1 ,
F(i).tumor vector for saving elements with lable 2 .
I attached my matlab code.
For adding each column's elements in a vector, It seems this code is not true. I'll be very gratefull to have your opinion.
close all;
clc
load lung.mat
F=lung;
[n,m]=size(F);
for i=1: m
s1=0; s2=0;
for j=1: n
if (F(j,m)==1)
for z=1:s1
F(i).normal(z)=F(j,i);
s1=s1+1;
end
else
for x=1:s2
F(i).tumor(x)=F(j,i);
s2=s2+1;
end
end
end
end

Respuesta aceptada

Image Analyst
Image Analyst el 27 de Dic. de 2018
You didn't attach lung.mat. But is this what you want:
% Create sample data.
data = randi(9, 32, 57); % Random integers in the range 1-9.
data(:, end) = randi(2, 32, 1) % Last columns is 1 or 2 ONLY.
% Find out what rows are labeled 1 and 2
% by looking in the last column.
rowsLabeled1 = data(:, end) == 1;
rowsLabeled2 = data(:, end) == 2;
% Extract rows labeled 1 and 2 into their own matrices.
data1 = data(rowsLabeled1, :);
data2 = data(rowsLabeled2, :);
% You can get vectors from each column by extracting it into a new variable
% e.g. to get 2 vectors for column 5, do
col51 = data1(:, 5); % Get col 5 with label 1.
col52 = data2(:, 5); % Get col 5 with label 2.
  14 comentarios
Image Analyst
Image Analyst el 29 de Dic. de 2018
You already know how to use pdist2, and you can plot all those distances, and even get a histogram of them. If you want to split into two zones, you can use graythresh(), imbinarize() or kmeans(), though like before I think that makes little to no sense. You still haven't explained why. Anyway, you should use a fixed threshold for consistency. Using an automatic threshold that varies depending on how many points are class 1 or class 2 is not good for comparing data sets. What if the distances were normally distributed? What does that mean? The numbers are uniformly distributed??? What if the distances had two clusters? What does that mean? That the measurements were in two tight clusters? It seems that by having the data for that measurement already labeled that someone has already somehow thresholded something, and it's probably the values themselves rather than the distance between them. But go ahead and do it and show us the values and the histograms, and the distance values and the distance value histogram and we can see if the distance histogram gives any additional insight.
It would be easy for you to make up data sets that range from clustered to uniformly distributed and compute the distances in each case. For example, in my K Nearest Neighbor demo, I create two classes, each with a spread, and a separation between the two classes. Though it's in 2-D for 2 variables. You could actually just make two classes in 1-D simply by using rand() and randn() and setting the mean and spread for each class.
Image Analyst
Image Analyst el 29 de Dic. de 2018
OK, I programmed up a simple Monte Carlo Simulation for you with uniform, non-overlapping distributions for two classes. It is attached. You can see the measurement values, the distance values, and the histogram of the distance values. I think you can do a lot of your experimentation and discovery of insights just by trying different distributions in a Monte Carlo fashion. For example, maybe the distribution of distances is the convolution of the distributions of the two measurement class distributions. What do you think?
% Program to do a Monte Carlo simulation of measurements between two classes of patients.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
imtool close all; % Close all imtool figures if you have the Image Processing Toolbox.
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 16;
% Specify parameters.
numClass1 = 120; % Number of measurements in class 1.
numClass2 = 80; % Number of measurements in class 2.
meanClass1 = 25;
meanClass2 = 75;
spread1 = 25;
spread2 = 25;
% Generate measurements
class1Values = meanClass1 + spread1 * (rand(numClass1, 1) - 1);
class2Values = meanClass2 + spread2 * (rand(numClass2, 1) - 1);
% Plot measurements
subplot(2, 2, 1);
plot(class1Values, 'b*', 'MarkerSize', 10, 'LineWidth', 2);
hold on;
plot(class2Values, 'r*', 'MarkerSize', 10, 'LineWidth', 2);
xlabel('Measurement Number', 'FontSize', fontSize);
ylabel('Measurement Value', 'FontSize', fontSize);
title('Measurement Value for Every Patient', 'FontSize', fontSize);
grid on;
legend1 = sprintf('%d in Class 1', numClass1);
legend2 = sprintf('%d in Class 2', numClass2);
legend(legend1, legend2, 'location', 'east');
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0, 0.04, 1, 0.96]);
drawnow;
% Compute distances of every point to every other point.
set1 = [zeros(length(class1Values), 1), class1Values];
set2 = [zeros(length(class2Values), 1), class2Values];
distances = pdist2(set1, set2);
subplot(2, 2, 2);
bar(distances);
grid on;
title('Distances between Class 1 Points and Class 2 Points', 'FontSize', fontSize);
xlabel('Pair Number', 'FontSize', fontSize);
ylabel('Distance between pair', 'FontSize', fontSize);
% Show histogram of distances.
subplot(2, 2, 3:4);
histogram(distances);
grid on;
caption = sprintf('Histogram of %d Distances between Class 1 Points and Class 2 Points', numel(distances));
title(caption, 'FontSize', fontSize);
xlabel('Distance', 'FontSize', fontSize);
ylabel('Count', 'FontSize', fontSize);
0000 Screenshot.png

Iniciar sesión para comentar.

Más respuestas (1)

Cris LaPierre
Cris LaPierre el 27 de Dic. de 2018
Your data is not attached, so nothing to test but have you looked into using a table and the functions findgroup and splitapply? See some examples here.
  1 comentario
phdcomputer Eng
phdcomputer Eng el 27 de Dic. de 2018
Thanks greatly
I attached a part of the data.(lung1.mat)
In the following code:
I used pdist2 function to compute distance between two column vectors by using jaccard measure.
I wrote this in command line to see the distance result:
pdist2(data(:,2),data(:,2),'jaccard');
but there is an error:
Undefined function or variable 'data'.
I'll be grateful to have your opinion.
close all;
clc
load lung.mat
data=lung;
[n,m]=size(data);
rowslabled1=data(:,m)==1;
rowslabled2=data(:,m)==2;
data1=data(rowslabled1,:);
data2=data(rowslabled2,:);
for i=1: m
data1(:,i);
data2(:,i);
d=pdist2(data(:,i),data(:,i),'jaccard');
end

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by