Finding discrepancies between two spreadsheets in MATLAB

3 visualizaciones (últimos 30 días)
Thanh Nguyen
Thanh Nguyen el 21 de En. de 2020
Comentada: Adam Danz el 22 de En. de 2020
Good evening! I am currently trying to work on writing a script that will allow me to find discrepancies between two spreadsheets (.xls, .csv, .xlsm).
Currently, the two spreadsheets have thousands of rows but a set number of columns. In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference." Is there a way to import the two spreadsheets and have MATLAB re-order the tables based on this similarity?
My current code reads the XLS and CVS files into two separate variables data1 and data2. One of the issues I notice is that I don't know how to declare the first row of each variable as the variable name. I'm not exactly sure if I'm even on the right path at this point so I'm very eager to listen to suggestions. I'm very new to this coding thing, aha!
This is my current code:
clc;
clear;
close all;
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the RTC Export:');
filename = fullfile(folder, baseName);
% Reads data in from selected file and loads them into num (only numeric),
% txt (only string) and raw (all data)
[num, txt, raw] = xlsread(filename);
data = [raw(:,1) raw(:,2) raw(:,5) raw(:,11) raw(:,12)];
tabledata = cell2table(data);
[baseName, folder] = uigetfile({'*.xls'; '*.csv'}, 'Select the JIRA Export:');
filename = fullfile(folder, baseName);
[num, txt, raw] = xlsread(filename);
data2 = [raw(:,3) raw(:,5) raw(:,2) raw(:,1)];
tabledata2 = cell2table(data2);
  9 comentarios
Thanh Nguyen
Thanh Nguyen el 21 de En. de 2020
Well we know that for each Issue ID of table 1 is just Cross Reference of table 2. So whatever information is associated along the row of the ticket in reference should be associated with the row in the other table with the same cross reference number. Does that make sense?
Example:
Cross reference # 316216 belongs to testtable1 Issue-Key B-2 because of the matching IssueID.
I don't know if MATLAB has a function that finds matches on a per-row basis.
Adam Danz
Adam Danz el 21 de En. de 2020
Editada: Adam Danz el 21 de En. de 2020
Ah, I see. This comment in your quesiton was misleading.
"In spreadsheet 1, there is a column called "Issue Key" that corresponds to another column in spreadsheet 2 that is labeled "Cross-reference."

Iniciar sesión para comentar.

Respuesta aceptada

Adam Danz
Adam Danz el 22 de En. de 2020
Editada: Adam Danz el 22 de En. de 2020
This uses Walter's innerjoin suggestion and then sorts the combined table back into the IssueKey order. If you'd rather use the IssueID order, remove the last two lines.
% Read data in as tables
folder = 'C:\......';
baseName1 = 'testtable1.xlsx';
baseName2 = 'testtable2.xlsx';
filename1 = fullfile(folder, baseName1);
filename2 = fullfile(folder, baseName2);
T1 = readtable(filename1); % you'll get a warning that column names were modified
T2 = readtable(filename2); % you'll get a warning that column names were modified
% Look at the tables
head(T1)
head(T2)
% Change CrossReference to IssueID
T2.Properties.VariableNames(ismember(T2.Properties.VariableNames,'CrossReference')) = {'IssueID'};
% Join tables
T = innerjoin(T1,T2);
% or
% T = outerjoin(T1,T2);
% Resort new table based on original order in T1
% assumes that all rows of T1 are in T.
[~, sortIdx] = ismember(T1.IssueKey,T.IssueKey);
T = T(sortIdx,:);
Result
T =
15×6 table
IssueKey Summary IssueID Status DueDate OrderNo_
________ ___________________________ __________ __________ _____________ ________
{'B-1' } {'Bike chain broken.' } 6.5413e+05 {'Open' } {'1/22/2020'} 5492
{'B-2' } {'Bike light broken.' } 3.1622e+05 {'Open' } {'1/22/2020'} 8135
{'B-3' } {'Bike seat torn.' } 8.4162e+05 {'Open' } {'1/22/2020'} 4613
{'B-4' } {'Bike in good condition.'} 9.5322e+05 {'Closed'} {'Closed' } 1256
{'B-5' } {'Bike in good condition.'} 1.2319e+05 {'Closed'} {'Closed' } 1617
{'B-6' } {'Bike in good condition.'} 7.8551e+05 {'Closed'} {'Closed' } 6772
{'B-7' } {'Bike chain broken.' } 3.1549e+05 {'Open' } {'1/29/2020'} 4381
{'B-8' } {'Rusty bike chain.' } 9.6345e+05 {'Open' } {'1/22/2020'} 1269
{'B-9' } {'Handlebar bent.' } 4.5642e+05 {'Open' } {'1/21/2020'} 3028
{'B-10'} {'Bike seat torn.' } 2.3312e+05 {'Open' } {'1/22/2020'} 8684
{'B-11'} {'Bike chain broken.' } 1.7899e+05 {'Open' } {'1/24/2020'} 3432
{'B-12'} {'Bike in good condition.'} 5.1087e+05 {'Closed'} {'Closed' } 1251
{'B-13'} {'Bike in good condition.'} 5.4675e+05 {'Closed'} {'Closed' } 8155
{'B-14'} {'Bike chain broken.' } 1.012e+05 {'Open' } {'1/20/2020'} 9493
{'B-15'} {'Bike seat torn.' } 3.0126e+05 {'Open' } {'1/25/2020'} 5810
  4 comentarios
Thanh Nguyen
Thanh Nguyen el 22 de En. de 2020
First off, thank you again for your elegant answer. I think I'm about 50% there in terms of the vision of what I'd like. However, I failed to list a couple of things that may or may not mess up this code.
Let me rephrase this question:
Lets say I give you those same tables, except in some fields, there are missing data. For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these? And what happens if I didn't have straight up numerical values that were similar?
Second question: What if the similar values were in the format ABCDE-12345? Example, what if the Cross Reference number and the Issue ID were in the ABCDE-12345 format instead? What is the workaround for such a case?
Adam Danz
Adam Danz el 22 de En. de 2020
"For example, what if B-12 did not have the description "Bike in good condition"? Would MATLAB be able to account for entries like these?"
You could simulate that to see what would happen.
T1.Summary{12} = '';
If you use outerjoin you'll see an empty indicator for that value. If you use innerjoin you'll see that the entire row is missing.
I'm not sure I undertand your second question. If there was no way to pair the rows of each table, then each row would be considered a unique row.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Tables en Help Center y File Exchange.

Productos


Versión

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by