minify Matlab code (minimize number of characters)

14 visualizaciones (últimos 30 días)
Rik
Rik el 9 de Oct. de 2020
Editada: Rik el 23 de Dic. de 2020
I would like to find a way that reduces the footprint of a function as much as possible. The actual functionality should not change (although stripping the message part of errors and warnings would be fine).
My intended purpose is to be able to add a function as a dependency, without having to fill pages and pages with (in such a case) irrelevant details. Only the entry function name should be the same, the rest is irrelevant. The resulting function should behave the exact same and still be in plain text.
Is there a good place to start? Is it possible to hook into the Matlab parser somehow to reliably determine which characters are comments (maybe by borrowing some of the work that happened for Octave)? I attempted to search on Google (and the FEX) to see if something was already available, but search queries like 'minimize Matlab code' obviously turn up a lot of unrelated results. I did find Matmini on Github, but that is poorly documented, 7 years old and in Python. I did get it to run, but it doesn't work as intended (the published version doesn't have nargin and nargout as special tokens, doesn't seem to handle ... at all, replaces many functions with the compact strings, and messes up many (if not most) of the chars).
My current solution involves stripping all line comments (i.e. lines starting with % or . when ignoring leading spaces) and removing all empty lines. This reduces my readfile function from 651 lines to 416, but I would prefer an even more drastic reduction. Matmini chops it down to 262 (if I manually replace all ...\n), but then it no longer actually works.
txt=readfile('readfile.m');
for n=1:numel(txt)
s=txt{n};
idx=find(s~=' ',1,'first');
if ~isempty(idx) && any(s(idx)=='%.')
txt{n}='';
end
if all(s==' '),txt{n}='';end
end
txt(cellfun('isempty',txt))=[];
fid=fopen('readfile_stripped.m','w');fprintf(fid,'%s\n',txt{:});fclose(fid);
%What I tried with Matmini:
alphabet=string(['a':'z' 'A':'Z']);
py.minify.minify_file("readfile_stripped.m",alphabet,"{'rename_vars'}",uint8(1))
Edit:
Another use case would be in places like the attached document. In such cases I want to be able to add extra functions at the end that will generate data to run the performance test. It isn't a big problem to have a large code block there, but I would prefer to have the option if I feel like it. (document like this one should show up in the examples tab in the FEX once they fix the 'preview not available' bug for entries linked to Github)
  13 comentarios
Rik
Rik el 12 de Oct. de 2020
Editada: Rik el 13 de Oct. de 2020
Meanwhile I was adapting that very same function. Attached is my result. It gobbles empty lines and correctly handles (and merges) comments after ellipses.
Edit: updated version that also should work for strings.
mainregex = [ ...
'( ' ... % Grouping parenthesis (content goes to $1).
' ( ^ | \n ) ' ... % Beginning of string or beginning of line.
' ( ' ... % Non-capturing grouping parenthesis.
' ' ...
'' ... % Match anything that is neither a comment nor a char array nor a string...
' ( ' ... % Non-capturing grouping parenthesis.
' [\]\)}\w.] ' ... % Either a character followed by
' ''+ ' ... % one or more transpose operators
' | ' ... % or else
' [^''"%](?!\.\.) ' ... % any character except a single/double quote (which
' ' ... % starts a char or string), or a percent (which starts
' ' ... % a comment), as long as it isn't the first dot of a ...
' ' ... % (requiring gobbling a newline). (note: this will generally
' ' ... % select 1 char too much for ...)
' )+ ' ... % Match one or more times.
' ' ...
'' ... % ...or...
' | ' ...
' ' ...
'' ... % ...match a char array.
' ( ' ... % Non-capturing grouping parenthesis.
' '' ' ... % Opening single quote that starts the char.
' [^''\n]* ' ... % Zero or more characters that are neither single quotes
' ' ... % (special) nor newlines (illegal).
' ( ' ... % Non-capturing grouping parenthesis.
' '''' ' ... % An embedded (literal) single quote character.
' [^''\n]* ' ... % Again, zero or more characters that are neither single quotes
' ' ... % nor newlines.
' )* ' ... % Match zero or more times.
' '' ' ... % Closing single quote that ends the char.
' ) ' ...
' ' ...
'' ... % ...or...
' | ' ...
' ' ...
'' ... % ...match a string.
' ( ' ...
' " ' ... % Opening double quote that starts the string.
' [^"\n]* ' ... % Zero or more characters that are neither double quotes
' ' ... % (special) nor newlines (illegal).
' ( ' ... % Non-capturing grouping parenthesis.
' "" ' ... % An embedded (literal) double quote character.
' [^"\n]* ' ... % Again, zero or more characters that are neither double quotes
' ' ... % nor newlines.
' )* ' ... % Match zero or more times.
' " ' ... % Closing double quote that ends the string.
' ) ' ...
' ' ...
' )* ' ... % Match zero or more times.
') ' ...
'[^\n]* ' ... % What remains must be a comment.
];
Rik
Rik el 18 de Nov. de 2020
While writing a tester, I noticed that the regex above doesn't correctly handle transposing a scalar string (althought I am not sure why you would transpose a scalar). So it fails for the line of code below.
""'%'
% ^ transpose
% ^ start of comment
Putting any character between the double and single quote fixes this issue (''"" fails as well, but that doesn't matter, as it is not valid Matlab syntax). My current test cases are posted below. This is the only failing case, but I'm not sure it is important enough to spend a lot of time fixing it.
%just a few examples of hard to parse syntaxes
'%".''...'%foo
%{
bar
%}
"foo"
bar.''
[foo...
bar]
[foo...foobar
bar]
""""'%'...
foo
[""]'.'%foo

Iniciar sesión para comentar.

Respuesta aceptada

Rik
Rik el 14 de Oct. de 2020
Editada: Rik el 23 de Dic. de 2020
It took some time, but here is my solution.
  1. Strip all comments and line continuations.
  2. Sort the functions (putting the entry function(s) at the top)
  3. Separate the code and the embeded chars/strings.
  4. Parse the code to select only the places where a new variable may be created in the workspace.
  5. Use that (and the list of local functions) to create a dictionary of variable and function names.
  6. Remove the entry function(s) from that dictionary and replace every occurence of a variable or local function with a shorter/pseudonimized one.
  7. Optional: replace all double spaces in code with single space.
  8. Put the code and chars/strings back together.
  9. Optional: merge and split lines so the result is shorter that a fixed length.
Any eval that relies on variable or function names is doomed to fail, but I think that is a feature, not a bug (although you can use func2str(@LocalFunction) to avoid this). The current version of the code also ignores the argument block, so if a function is using that, there will very likely be issues.
I just posted this function to the FEX, and here it is as well (applied to itself). The result below is for a max line length of 150 characters. After writing a few tests (so I could confirm compatibility with Octave and old Matlab releases) I rewrote a completely new comment stripper, which I might publish separately as well. It currently reduces the total function length by about 80-90% (in this specific case 89%, from 1343 lines to 150).
% minify (version 1.0.1) was minified to make it more compact.
% For the normal version see https://www.mathworks.com/matlabcentral/fileexchange/84717
% The unaltered documentation can be found below.
%
% Process Matlab code into a solid block of compact and unreadable but functionally equivalent code.
% The output may require some manual tweaking to be optimally compact.
%
% Syntax:
% str_out=minify(str_in)
% str_out=minify(str_in,BeastMode)
% str_out=minify(str_in,BeastMode,max_length_after_merge)
% str_out=minify(str_in,BeastMode,max_length_after_merge,EntryFunctionNames)
%
% Arguments:
% str_out: mx1 cell array with the minified code
% str_in: nx1 cell array of char arrays
% BeastMode: boolean that triggers additional steps to further minify the code
% (specific aspects can be set individually by supplying a struct instead)
% max_length_after_merge: target maximum length (might be exceeded by original code after the
% removal of the line continuation)
% if <10 this is multiplied by the maximum line length preference
% (if you want more than 10x, you can provide a negative value)
% EntryFunctionNames: cell array with function names that should be kept unchanged, these
% functions will be move to the top if there are multiple functions. This
% defaults to the first function (if any).
%
% BeastMode parameters:
% (missing fields will be filled with the default, BeastMode=false will set all fields to false)
% trim_spaces [true] Strip leading spaces and remove all double spaces that do
% not occur inside a char or a string.
% compress_to_block [false] Attempt to compress the entire input to a single block.
% Setting this to true will cause the resulting code to be
% incompatible between modern Matlab releases and Octave and
% ML6.5. The reason is that the former require a space
% between 'end' and 'function', while the latter require a
% comma or semicolon.
% compress_functions_separately [true] Compress each function to a separate block of code.
% If proper detection of nested function is implemented, this
% will compress each parent function including the nested
% functions to a single block.
% If compress_to_block is set to true, this is ignored.
% keep_original_function_names [false] Do not rename local functions.
% This setting can be used if any code uses eval with the
% name of a local function in a char or a string. (you can
% use func2str(@local_function) to work around this issue)
% contains_nested_functions [false] Setting this to true will skip the steps that would break
% nested functions.
%
% Example:
% str=readfile('readfile.m');
% str=minify(str);
% fid=fopen(fn_out,'w');fprintf(fid,'%s\n',str{:});fclose(fid);
%
% The likely use cases fall generally under two categories:
% 1) Attaching code as a dependency to allow your code to run, without having to refer to
% separate FEX/Github entries. In some such places understanding the attached code is not
% important, while space is at a premium (e.g. pdf attachments, where large amounts of
% dependent code would be distracting).
% 2) Code obfuscation. Since p-code will only run on a subset of Matlab releases (and not at all
% on GNU Octave), using that will limit compatibility. Additionally, given that the
% encryption has been broken, this might be a good additional step (or replacement) to hide
% the function of your code without harming its function.
%
% The process of compacting Matlab code is split into several steps:
% 1) Strip all comments and line continuations.
% 2) Sort the functions (if there are multiple) in order to keep the entry function(s) at the
% top of the output.
% 3) Separate the code and the embedded chars/strings.
% 4) Parse the code to select only the places where a new variable may be created in the
% workspace. False positives should be avoided at all costs. False negatives should be
% avoided, but are not a major issue.
% 5) Use that (and the list of local functions) to create a dictionary of variable and function
% names.
% 6) Remove the entry function from that dictionary and replace every occurrence of a variable
% or local function with a shorter/pseudonymized one.
% 7) (BeastMode==true) Replace all double spaces in code with single space.
% 8) Put the code and chars/strings back together.
% 9) (BeastMode==true) Merge lines if the result is shorter that a fixed length. A space, comma,
% or semicolon may be added. Be careful with functions that rely on printing results to the
% command window by omitting the semicolon, although such functions should probably not be
% minified in the first place (as the variable names are changed).
% Steps 3-8 should be performed for each function separately.
%
% This function was tested on a random sample of 1000 m files from the FileExchange. Some limits
% were imposed on the selection of files: only submissions with 5 downloads or more in the last 30
% days, at most 5 files per submission (taking the first files in the hierarchy, without checking
% if those actually would contain the main function), ignoring functions with fewer than 5 or more
% than 1000 lines, and ignoring functions with lines over 1000 characters long (as those probably
% contain data, not real code). If BeastMode is set to true the size of functions tends to be
% reduced to about 13% of the original number of lines (half of them are between 9.5% and 16.7%).
% Setting BeastMode to false will increase the variability by a lot, and results in file sizes of
% about 51% of the original number of lines (38.3-66.5%).
% The compression depends mostly on the amount of comments, typical line length, amount and length
% of chars/strings, and typical variable name length.
%
% Compatibility considerations:
% - Support for eval and friends is limited to situation where they don't rely on variable/function
% names (e.g. if it is used to create an anonymous function). You could use something like
% func2str(@local_function) to use a local function call inside an eval statement.
% - Nested functions are tricky to extract. It is possible (match up end statements with 'if',
% 'try', 'while', 'for', 'parfor', and 'function', then confirm every function has an 'end' and
% ignore nested functions while sorting the functions). I don't use them (as they are
% incompatible with Matlab 6.5 and Octave), but it isn't impossible to modify this function. You
% can set contains_nested_functions to true to turn off the parts that will interfere with nested
% functions.
% - There is no support for the arguments block. This is not a fundamental issue, it is just not
% yet implemented in this function.
%
% _____________________________________________________________________________
% | Compatibility | Windows XP/7/10 | Ubuntu 20.04 LTS | MacOS 10.15 Catalina |
% |-----------------|-----------------|------------------|----------------------|
% | ML R2020b | W10: works | not tested | not tested |
% | ML R2018a | W10: works | works | not tested |
% | ML R2015a | W10: works | works | not tested |
% | ML R2011a | W10: works | works | not tested |
% | ML R2010b | not tested | works | not tested |
% | ML R2010a | W7: works | not tested | not tested |
% | ML 7.1 (R14SP3) | XP: works | not tested | not tested |
% | ML 6.5 (R13) | W10: works | not tested | not tested |
% | Octave 6.1.0 | W10: works | not tested | not tested |
% | Octave 5.2.0 | W10: works | works | not tested |
% | Octave 4.4.1 | W10: works | not tested | works |
% """""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
% note: Octave and ML6.5 require a semicolon or comma between 'end' and 'function', while newer
% Matlab releases either require or allow a space. Setting compress_to_block to true will make the
% resulting code incompatible between the two styles.
%
% Version: 1.0.1
% Date: 2020-12-23
% Author: H.J. Wisselink
% Licence: CC by-nc-sa 4.0 ( https://creativecommons.org/licenses/by-nc-sa/4.0 )
% Email = 'h_j_wisselink*alumnus_utwente_nl';
% Real_email = regexprep(Email,{'*','_'},{'@','.'})
function v000=minify(v000,v001,v002,v003,...
v004),persistent v005,if isempty(v005),v005 = exist('OCTAVE_VERSION', 'builtin') ~= 0;end,if nargin<1,error('not enough inputs');end,if nargin<2 ...
|| isempty(v001),v001=true;end,if nargin<3 || isempty(v002),v002=1.5;end,if nargin<5,v004='';end,v006=false;try v006=isequal(v000,cellstr(v000));
catch,end,if ~v006,error('first input must be a cellstr');end,v001=f15(v001);if v002<10,v002=ceil(abs(v002)*f12);end,f18(v004);v000=f04(v000);
if numel(v000)==0,return,end,if nargin<4 || isempty(v003),v003=f05(v000{1});end,v003=cellstr(v003);if ~v001.contains_nested_functions,...
v007=f17(v000,v003);else,v007={v000};end,if v001.keep_original_function_names,v008=cell(0,2);else,[v009,v010]=f19(vertcat(v007{:}));
v008=f05(v010);v008(ismember(v008,v003))=[];for v011=1:numel(v008),v008{v011,2}=sprintf('f%02d',v011-1);end,if v001.contains_nested_functions,...
v012=f07(v010);for v011=1:numel(v012),v012{v011,2}=sprintf('v%03d',v011-1);end,end,end,persistent v013,if isempty(v013),v013=f03('>=',7,'Octave',...
'>',0);end,for v014=1:numel(v007),v000=v007{v014};[v015,v010]=f19(v000);v016=f07(v010);if v001.contains_nested_functions,v017=ismember(v012(:,...
1),v016);v018=[v008;v012(v017,:)];else,for v011=1:numel(v016),v016{v011,2}=sprintf('v%03d',v011-1);end,v018=[v008;v016];end,if ...
numel(v018)==0,v018=cell(0);end,v019=size(v015,1)==1;if v019,v015(2,1)={' '};end,v020=~cellfun('isempty',v015);v020(:,2:2:end)=false;if v013,for ...
v021=1:size(v018,1),v022=['(^|(?<=[^a-zA-Z0-9_\.]))(' v018{v021,1} ')(?=[^a-zA-Z0-9_]|$)'];for v023=find(v020).',v015{v023}=regexprep(v015{v023},...
v022,v018{v021,2});end,end,else,for v021=1:size(v018,1),v022=['([^a-zA-Z0-9_\.])(' v018{v021,1} ')([^a-zA-Z0-9_])'];for v023=find(v020).',...
v000=[' ' v015{v023} ' '];v000=regexprep(v000,v022,['$1' v018{v021,2} '$3'],'tokenize');v000=regexprep(v000,v022,['$1' v018{v021,2} ...
'$3'],'tokenize');v015{v023}=v000(2:(end-1));end,end,end,if v001.trim_spaces,for v023=find(v020).',v024=inf;while v024~=0,v025=length(v015{v023});
v015{v023}=strrep(v015{v023},' ',' ');v026=length(v015{v023});v024=v025-v026;end,end,for v027=1:size(v020,1),if numel(v015{v027,...
1})>=1 && strcmp(v015{v027,1}(1),' '),v015{v027,1}(1)='';end,end,end,if v019,v015(2,:)=[];end,for v027=1:size(v015,1),v028=v015(v027,:);
if v005,v028(cellfun('isempty',v028))={''};end,v015{v027,1}=[v028{:}];end,v000=v015(:,1);v007{v014}=v000;end,if v001.compress_to_block,...
if numel(v003)>1,v029=cell(1,1+numel(v003));v029(1:numel(v003))=v007(1:numel(v003));if numel(v007)>numel(v003),v030=v007((numel(v003)+1):end);
v029{end}=vertcat(v030{:});else,v029(end)=[];end,for v011=1:numel(v029),v029{v011}=f00(v029{v011},v002);end,v000=vertcat(v029{:});
else,v000=vertcat(v007{:});v000=f00(v000,v002);end,elseif v001.compress_functions_separately,for v011=1:numel(v007),if numel(v007{v011})==1,...
continue,end,v007{v011}=f00(v007{v011},v002);end,v000=vertcat(v007{:});else,v000=vertcat(v007{:});end,if nargout==0,clear v000,end,end
function v000=f00(v001,v002),v001=f10(v001,...
inf,false);v003=v001{1};v001=f11(v003);v004=cell(size(v001));for v005=1:size(v004,2),v006=zeros(size(v001{1,v005}));v006(v001{1,v005}==';')=1;
v006(v001{1,v005}==',')=2;v006(v001{1,v005}==' ')=3;v006(v001{1,v005}=='+')=4;v006(v001{1,v005}=='-')=4;v006(v001{1,v005}=='@')=4;v004{1,...
v005}=v006;v004{2,v005}=zeros(size(v001{2,v005}));end,v004=horzcat(v004{:});v007=fliplr(v004);v008=fliplr(v003);v000=cell(ceil(numel(v004)/v002),1);
v009=1;while ~isempty(v008),if v009>1 && strcmp(v000{end-1},'...'),v009=v009-1;v010=find(v007>0);v007(v010(1))=0;end,v010=find(v007>0);
if isempty(v010) || numel(v008)<=v002,v000{v009}=v008(end:-1:1);break,end,v010(v010>v002)=[];if isempty(v010),v010=find(v007>0);v010=min(v010);
else,v011(1)=max([-inf v010(v007(v010)==1)]);v011(2)=max([-inf v010(v007(v010)==2)]);v011(3)=max([-inf v010(v007(v010)==3)]);v011(4)=max([-inf ...
v010(v007(v010)==4)]);[v012,v013]=max(v011-[0 5 8 15]);v010=v011(v013);v014=v007(v010)~=1;end,v010=max(1,v010-1);v000{v009}=fliplr(v008(1:v010));
v009=v009+1;v008(1:v010)='';v007(1:v010)='';if v014,v008=['...' v008];v007=[0 0 0 v007];end,end,v000=flipud(v000);end
function v000=f01(v001),v001=[' ' v001 ' '];v002=strfind(v001,'''');v003=strfind(v001,'"');v000=zeros(size(v001));
for v004=sort([v002 v003]),if v000(v004)==-1,continue,end,if strcmp(v001(v004),''''),if v000(v004)==5,continue,end,if v000(v004)==2,...
if strcmp(v001(v004+1),''''),v000(v004+1)=-1;else,v000(v004)=3;v000((v004+1):end)=0;end,else,if v000(v004-1)~=3 && ~isempty(regexp(v001(v004-1),...
'[\]\)}.\w''"]'));else;v000(v004)=1;v000((v004+1):end)=2;end;end;else;if v000(v004)==2,continue,end;if v000(v004)==5;if strcmp(v001(v004+1),'"');
v000(v004+1)=-1;else;v000(v004)=6;v000((v004+1):end)=0;end;else;v000(v004)=4;v000((v004+1):end)=5;end;end;end;v000([1 end])=[];end
function [v000,v001]=f02(v002),v003=[' ' v002(1:(end-1))];v004 = double(v002=='[') - double(v003==']');v005 = double(v002=='(') ...
- double(v003==')');v006 = double(v002=='{') - double(v003=='}');v007 = ( cumsum(v004)+cumsum(v005)+cumsum(v006) ) == 0;v001=find( v007 & ...
( v002==',' | v002==';' ) );if ~isempty(v001),if all(ismember(v002(v001(end):end),',; ')),v001(end)=[];end,end,if isempty(v001),v000={v002};v001=0;
else,v000=cell(1+numel(v001),1);v001=[0 v001 numel(v002)+1];for v008=1:numel(v000),v000{v008}=v002( (v001(v008)+1):(v001(v008+1)-1) );end,end,end
function v000=f03(v001,v002,v003,v004,v005),persistent v006 v007 v008,...
if isempty(v006),v008=exist('OCTAVE_VERSION', 'builtin');v006=version;v009=strfind(v006,'.');if numel(v009)~=1,v006(v009(2):end)='';v009=v009(1);
end,v006=[str2double(v006(1:(v009-1))) str2double(v006((v009+1):end))];v006=v006(1)+v006(2)/100;v006=round(100*v006);v007={ 'R13' 605;'R13SP1' 605;
'R13SP2' 605;'R14' 700;'R14SP1' 700;'R14SP2' 700;'R14SP3' 701;'R2006a' 702;'R2006b' 703;'R2007a' 704;'R2007b' 705;'R2008a' 706;'R2008b' 707;
'R2009a' 708;'R2009b' 709;'R2010a' 710;'R2010b' 711;'R2011a' 712;'R2011b' 713;'R2012a' 714;'R2012b' 800;'R2013a' 801;'R2013b' 802;'R2014a' 803;
'R2014b' 804;'R2015a' 805;'R2015b' 806;'R2016a' 900;'R2016b' 901;'R2017a' 902;'R2017b' 903;'R2018a' 904;'R2018b' 905;'R2019a' 906;'R2019b' 907;
'R2020a' 908;'R2020b',909};end,if v008,if nargin==2,warning('HJW:ifversion:NoOctaveTest',['No version test for Octave was provided.',...
char(10),'This function might return an unexpected outcome.']),if isnumeric(v002),v010=0.1*v002+0.9*fix(v002);v010=round(100*v010);
else,v011=ismember(v007(:,1),v002);if sum(v011)~=1,warning('HJW:ifversion:NotInDict','The requested version is not in the hard-coded list.'),...
v000=NaN;return,else,v010=v007{v011,2};end,end,elseif nargin==4,[v001,v010]=deal(v003,v004);v010=0.1*v010+0.9*fix(v010);
v010=round(100*v010);else,[v001,v010]=deal(v004,v005);v010=0.1*v010+0.9*fix(v010);v010=round(100*v010);end,else,if isnumeric(v002),...
v010=0.1*v002+0.9*fix(v002);v010=round(100*v010);else,v011=ismember(v007(:,1),v002);if sum(v011)~=1,warning('HJW:ifversion:NotInDict',...
'The requested version is not in the hard-coded list.'),v000=NaN;return,else,v010=v007{v011,2};end,end,end,switch v001,case '==',...
v000= v006 == v010;case '<' , v000= v006 < v010;case '<=', v000= v006 <= v010;case '>' , v000= v006 > v010;case '>=', v000= v006 >= v010;end,end
function v000 = f04(v001),v002=isa(v001,'string');if v002,...
v001=cellstr(v001);end,if isa(v001,'cell'),for v003=1:numel(v001),if (ndims(v001{v003}) > 2) || (any(size(v001{v003}) > 0) && (size(v001{v003},...
1) ~= 1)),error('HJW:StripComments:InvalidInput','All character arrays must be row vectors.');end,end,end,v004=ischar(v001);if v004,[v001,v005,...
v006]=f20(v001);else,v005='';v006=false;for v003=1:numel(v001),[v001{v003},v007,v008]=f20(v001{v003});v006=v006 || v008;if numel(v001{v003})>1,...
v005=v007;end,end,if isempty(v005),v005=v007;end,v001=vertcat(v001{:});end,v009=cell(1,2);for v003=1:2,if v003==1,v010='\{';else,v010='\}';
end,v009{v003}=regexp(v001,['^[^\S]*%' v010 '[^\S]*$']);if isa(v009{v003},'double'),v009{v003}=v009(v003);end,v009{v003}=~cellfun('isempty',...
v009{v003});end,v009=cumsum(v009{1}-v009{2})>0 | v009{2};v001(v009)=[];v000 = v001;for v003 = 1:numel(v000),v011 = v000{v003};v011 = f13(v011);
v012=isspace(v011);if all(v012),v013=0;else,v013=find(~v012);end,if ~isempty(v013)&&v013(end)<numel(v011),v011((v013(end)+1):end)=[];end,...
v000{v003} = v011;end,for v003 = numel(v000):-1:1,v014=v001{v003}((1+numel(v000{v003})):end);[v015,v016]=regexp(v014,'[\s]*[^\s\.%]?[\s]*\.{3}');
if ~isempty(v016)&&v015(1)==1&&numel(v001)>v003,v013=find(~isspace(v000{v003+1}));if ~isempty(v013)&&v013(1)>1,v013=v013(1);else,...
v013=1;end,if numel(v000{v003})>=1 && any(v000{v003}(end)=='~,;+-*/^\@<>&|='),v017='';else,v017=' ';end,if v013==1,v000{v003}=[v000{v003} v017 ...
v000{v003+1}];v000(v003+1)=[];else,v000{v003}=[v000{v003} v017 v000{v003+1}(v013:end)];v000(v003+1)=[];end,end,end,v000(cellfun('isempty',v000))=[];
if v004,v000 = sprintf(['%s' v005],v000{:});if ~v006,try v000((end-numel(v005)+1):end) = '';catch,end,end,elseif v002,v000=string(v000);end,end
function [v000,v001,v002]=f05(v003),v004=isa(v003,...
'char');if v004,v003={v003};end,[v005,v003]=f19(v003);v006=warning('off','MATLAB:REGEXP:deprecated');v000=cell(0);v001=zeros(0);v002=zeros(0);
v007=0;for v008=1:numel(v003),[v009,v010]=f02(v003{v008});for v011=1:numel(v009),v012=f06(v009{v011});if ~isempty(v012),v007=v007+1;v000{v007,...
1}=v012;v001(v007,1)=v008;v002(v007,1)=v010(v011);end,end,end,if numel(v000)==0,v000={'function'};end,if v004,v000=v000{1};end,warning(v006);end
function v000=f06(v001),v000='';v002='\s*function[\s\[](.*)';v003=regexp(v001,v002,'once');if ~isempty(v003);v004=regexprep(v001,...
v002,'$1', 'tokenize');v004=v004( max([1 1+strfind(v004,'=')]) : min([strfind(v004,'(')-1 numel(v004)]) );v000=strrep(v004,' ','');end;end
function v000=f07(v001),v002=warning('off','MATLAB:REGEXP:deprecated');
persistent v003,if isempty(v003),try unique([1 1 2],'stable');v003=false;catch,v003=true;end,end,v000=cell(0);v004=0;v005={'varargin';
'varargout'};v006=2;for v007=1:numel(v001),v008=f02(v001{v007});for v009=1:numel(v008),[v000,v004,v005,v006]=f08(v008{v009},v000,v004,v005,v006);
end,end,if ~v003,v000=unique(v000,'stable');else,[v010,v011]=unique(v000);v000=v000(sort(v011));end,v000(ismember(v000,v005))=[];warning(v002),end
function [v000,v001,v002,v003]=f08(v004,v000,v001,v002,v003),persistent v005,if ...
isempty(v005),v005 = exist('OCTAVE_VERSION', 'builtin') ~= 0;end,v006=[' ' v004];v007='\s*function[\s\[](.*)';v008=regexp(v004,v007,'once');if ...
~isempty(v008);v006=regexprep(v004,v007,'$1', 'tokenize');v009='(^)|(\]?\s*=)\s*(([a-zA-Z]+[_a-zA-Z0-9]*))[^_a-zA-Z]';[v010,v011]=regexp(v006,v009);
try v012=v006(v010(1):v011(1));v012=regexprep(v012,'[^_a-zA-Z0-9]*([_a-zA-Z0-9]*)[^_a-zA-Z0-9]*','$1','tokenize');catch,v012=regexprep(v006,...
v009,'$1','tokenize');end,v006=regexprep(v006,['\]?\s*=\s*' v012 '\s*\((.*)\)'],',$1]', 'tokenize');v006=strrep(v006,v012,'');if ...
isempty(v006),return,end;[v000,v001]=f09(v006,v000,v001);return;end;v013=zeros(size(v006));v014 ='{';v013( v006==v014 )= 1;v015='}';v013([false ...
v006==v015])=-1;v013=cumsum(v013);v006(v013(1:(end-1))>0)='';v013=zeros(size(v006));v014 ='(';v013( v006==v014 )= 1;v015=')';v013([false ...
v006==v015])=-1;v013=cumsum(v013);v006(v013(1:(end-1))>0)='';v016={'[;,\s]' , '\s+(([a-zA-Z][a-zA-Z0-9_]*\s*)*)'};v017= [v016{1} 'persistent' ...
v016{2}];if v005;if isempty(regexp(v006,v017, 'once'));else;v018=regexprep(v006,v017,'$1');[v000,v001]=f09(v018,v000,v001);v006=regexprep(v006,...
v017,'');end;else;[v019,v020,v018]=regexp(v006,v017);for v021=numel(v018):-1:1;[v000,v001]=f09(v006(v018{v021}(1):v018{v021}(2)),v000,v001);
v006(v019(v021):v020(v021))='';end;end;v017= [v016{1} 'global' v016{2}];if v005;if isempty(regexp(v006,v017, 'once'));else;v018=regexprep(v006,...
v017,'$1');[v002,v003]=f09(v018,v002,v003);v006=regexprep(v006,v017,'');end;else;[v019,v020,v018]=regexp(v006,v017);for v021=numel(v018):-1:1;
[v002,v003]=f09(v006(v018{v021}(1):v018{v021}(2)),v002,v003);v006(v019(v021):v020(v021))='';end;end;v006=strrep(v006,'==','<');v006=strrep(v006,...
'~=','<');v006=strrep(v006,'>=','<');v006=strrep(v006,'<=','<');for v022=numel(v006):-1:1;v008=find(v006=='=');if isempty(v008);break;else;
v006(v008(end):end)='';v008={max(find(v006=='[')), max([find(v006==',') find(v006==';')]), 1};v008=[v008{:}];v008=v008(1);v023=v006(v008(end):end);
v006(v008(end):end)='';v023=regexprep(v023,'\s+for\s+','');v023=regexprep(v023,'\s+try\s+','');[v000,v001]=f09(v023,v000,v001);end;end;end
function [v000,v001]=f09(v002,...
v000,v001),[v003,v004,v005]=regexp(v002,'\[(.*)\]|((.*)?)');v002=v002(v005{1}(1):v005{1}(2));v006 = '[^a-zA-Z_\.]([a-zA-Z][a-zA-Z0-9_]*)';
v002=[' ' v002];[v003,v004,v005]=regexp(v002,v006);for v007=1:numel(v005),v001=v001+1;v000{v001,1}=v002(v005{v007}(1):v005{v007}(2));end,end
function v000=f10(v000,v001,v002),if nargin<3,v002=true;end,...
v003=f14(v000);for v004=(numel(v000)-1):-1:1,v005=v000{v004};v006=v000{v004+1};v006=regexprep(v006,'^\s*','');if v002,if strcmp(v006(1:(min(end,...
8))),'function'),continue,end,end,if ~strcmp(v005(end),';') && ~strcmp(v005(end),','),v005=[v005 v003(v004)];end,if strcmp(v005((max(1,...
end-3)):max(1,end-1)),'try'),v005(end)=' ';end,if numel(v005)+numel(v006) <= v001,v000{v004}=[v005 v006];v000(v004+1)=[];end,end,end
function v000=f11(v001),v001=[' ' v001 ' '];v002=f01(v001);v003=diff(v002==0);v004=[1 ...
find(v003==1)+1 ];v005=[ find(v003==-1) numel(v002)];v000=cell(2,numel(v004));for v006=1:numel(v004),v007=v004(v006);v008=v005(v006);v000{1,...
v006}=v001(v007:v008);v007=v005(v006)+1;v008=v004(min(v006+1,end))-1;v000{2,v006}=v001(v007:v008);end,v000{1, 1 }( 1 )='';v000{1,end}(end)='';end
function v000=f12(varargin),...
persistent v001 v002,if isempty(v001),v001.isOctave=exist('OCTAVE_VERSION', 'builtin');if ~v001.isOctave,v001.settings=f03('>=',...
'R2018a');v001.com.mathworks=f03('>=','R13');v001.R13=f03('==','R13');end,if v001.isOctave,v002=80;elseif f03('<=','R2010b'),v002=75;else,v002=80;
end,end,if v001.isOctave,v000=0;elseif v001.settings,v003 = settings;v000=v003.matlab.editor.displaysettings.linelimit.LineColumn.ActiveValue;
elseif v001.com.mathworks,v000=com.mathworks.services.Prefs.getIntegerPref('EditorRightTextLineLimit');
elseif v001.R13,v000=com.mathworks.services.Prefs.getIntegerPref('EditorMaxCommentWidth');
else,v000=80;end,v000=double(v000);if isempty(v000) || v000==0,v000=v002;end,end
function ...
v000=f13(v001),[v002,v003]=f20(v001);if numel(v002)>1,for v004=1:numel(v002),v002{v004}=f13(v002{v004});end,v002=v002.';v002(2,1:(end-1))={v003};
v002{end}='';v000=horzcat(v002{:});if isa(v001,'string'),v000=string(v000);end,return,else,v001=v002{1};end,v005=f01(v001);v006=strfind(v001,'%');
v007=strfind(v001,'...');v008=[v006(v005(v006)==0) v007(v005(v007)==0)];if ~isempty(v008),v001(min(v008):end)='';end,v000=v001;end
function v000=f14(v001),v000=repmat(',',numel(v001),1);v002=cumsum(cellfun('prodofsize',...
v001));v001=horzcat(v001{:});v003 = double(v001=='[') - double(v001==']');v004 = double(v001=='{') - double(v001=='}');
v005 = ( cumsum(v003)+cumsum(v004) );v006=v005(v002)~=0;v000(v006)=';';v006=ismember(v002,strfind(v001,'endfunction')+2);v000(v006)=f18;end
function v000=f15(v001),persistent v002 v003,if isempty(v002),...
v002=struct;v002.trim_spaces=true;v002.compress_to_block=false;v002.compress_functions_separately=true;v002.keep_original_function_names=false;
v002.contains_nested_functions=false;v003=v002;v004=fieldnames(v003);for v005=1:numel(v004),v003.(v004{v005})=false;end,...
end,if ~isa(v001,'struct'),[v006,v001]=f21(v001);if ~v006 || v001,v000=v002;return,else,v000=v003;return,end,end,v000=v002;v004=fieldnames(v001);
for v005=1:numel(v004),[v006,v007]=f21(v001.(v004{v005}));if v006,v000.(v004{v005})=v007;else,v000.(v004{v005})=true;end,end,end
function varargout=f16(v000,v001,varargin),if nargin<3,error('HJW:regexp_outkeys:SyntaxError',...
'No supported syntax used: at least 3 inputs expected.'),end,if ~(ischar(v000) && ischar(v001)),error('HJW:regexp_outkeys:InputError',...
'All inputs must be char vectors.'),end,persistent v002,if isempty(v002),v002.match = f03('<','R14','Octave',...
'<',4);v002.split = f03('<','R2007b','Octave','<',4);end,varargout=cell(size(varargin));for v003=1:(nargin-2),if ~ischar(varargin{v003}),...
error('HJW:regexp_outkeys:InputError','All inputs must be char vectors.'),end,switch lower(varargin{v003}),case 'match',if v002.match,...
[v004,v005]=regexp(v000,v001);v006=cell(1,numel(v004));for v007=1:numel(v004),v006{v007}=v000(v004(v007):v005(v007));end,else,v006=regexp(v000,...
v001,'match');end,varargout{v003}=v006;case 'split',if v002.split,[v004,v005]=regexp(v000,v001);v008=cell(1,numel(v004)+1);v009=[v004 ...
numel(v000)+1];v010=[0 v005];for v007=1:numel(v009),v008{v007}=v000((v010(v007)+1):(v009(v007)-1));end,else,v008= regexp(v000,v001,'split');
end,varargout{v003}=v008;otherwise,v011=fieldnames(v002);v012=['Extra regexp output type not implemented, only the following types are ',...
'implemented:',char(10),sprintf('%s, ',v011{:})];v012((end-1):end)='';error('HJW:regexp_outkeys:NotImplemented',v012),end,end,end
function v000=f17(v001,v002),[v003,v004,v005]=f05(v001);if numel(v004)<=1,...
v000={v001};return,end,if any(v005>1),v006=unique(v004(v005>1)).';v006=v006(end:-1:1);for v007=v006,v008=v001{v007};v009=v005(v004==v007).';
v009(v009==0)=[];v010=diff([0 v009 numel(v008)]);v008=mat2cell(v008,1,v010).';v001=[v001(1:(v007-1));v008;v001((v007+1):end)];
end,[v003,v004]=f05(v001);end,[v011,v012]=ismember(v003,v002);[v013,v014]=sort(v003(~v011));v012(~v011)=sum(v011)+v014;[v013,v012]=sort(v012);
v000=cell(numel(v004),1);v004=[v004;numel(v001)+1];for v015=1:numel(v000),v000{v015}=v001(v004(v015):(v004(v015+1)-1));end,v000=v000(v012);end
function v000=f18(v000),persistent ...
v001,if nargin>0, v001=v000;end,if isempty(v001),if f03('<',7,'Octave','>',0),v001=',';else, v001=' ';end,end,if nargout>0,v000=v001;end,end
function [v000,v001]=f19(v002),persistent v003,if isempty(v003),v003 = exist('OCTAVE_VERSION',...
'builtin') ~= 0;end,v000=cell(numel(v002),1);v001=cell(numel(v002),1);for v004=1:numel(v002),v005=f11(v002{v004});v000(v004,...
1:2:(2*size(v005,2)))=v005(1,:);v000(v004,2:2:(2*size(v005,2)-1))=v005(2,1:(end-1));for v006=1:(size(v005,2)-1),v007=v005{2,v006}(1);v005{2,...
v006}=[v007 repmat('_',1,numel(v005{2,v006})-2) v007];end,if v003,v005(cellfun('isempty',v005))={''};end,v005=[v005{:}];v001{v004}=v005;end,end
function ...
[v000,v001,v002]=f20(v003),v003=char(v003);v002=false;if ~( any(v003==10) || any(v003==13) ),if ispc,v001=char([13 10]);else,v001=char(10);end,...
v000={v003};else,v004=find(v003==10 | v003==13);v004=v004(1:min(2,end));if numel(v004)==2 &&( diff(v004)~=1 || v003(v004(1))==v003(v004(2)) ),...
v004(2)=[];end,v001=v003(v004);if numel(v003)>1,v002=strcmp(v001,v003((end-numel(v001)+1):end));end,v000=f16(v003,v001,'split');v000=v000(:);end,end
function [v000,v001]=f21(v001),persistent v002,if isempty(v002),v002={true,false;1,0;
'on','off'};try v002(end+1,:)=eval('{"on","off"}');catch,end,end,v000=true;try for v003=1:size(v002,1),for v004=1:2,if isequal(v001,v002{v003,...
v004}),v001=v002{1,v004};return,end,end,end,if isa(v001,'matlab.lang.OnOffSwitchState'),v001=logical(v001);return,end,catch,end,v000=false;end

Más respuestas (1)

J. Alex Lee
J. Alex Lee el 12 de Oct. de 2020
Partial answer to implement Rik's original stripping of blank lines and full comment lines, and also attempt to squeeze the line continuations (though may not be robust to detect ellipses followed by comments; slightly modified regex to detect comments following ellipses, not sure if it is better or worse)
txt = readfile("readfile.m"); % use Rik's own readfile to return as cell array of char arrays
% simple stripping of full comment lines and empty lines
txt = regexprep(txt,'^\s*\%.*$','');
txt = regexprep(txt,'^\s*$','');
txt(cellfun(@isempty,txt)) = [];
txt = regexprep(txt,'(?<=\.\.\.)\s*\%*.*$','');
str = join(string(txt)+newline,"");
str = regexprep(str,'\.\.\.\s*\n\s*',' ');
txt = split(str,newline);
Does not address the more interesting part of the problem, to detect variable names and replace with shorter variable names.
  5 comentarios
J. Alex Lee
J. Alex Lee el 8 de Dic. de 2020
Ah, I see...so it will be worth for me to try to understand the code.
Relatedly, I'm curious to what extent this exercise is related to how text editors implement synatx highlighting...is it doing something equivalently sophisticated in the background?
Rik
Rik el 8 de Dic. de 2020
Yes, I intended it to be in a format that you would be able to understand.
I suspect all editors are either doing something similar to what I'm doing, or something smarter. It is hard to see visually see what Notepad++ is doing exactly with the edge cases, but they must have implemented something to this effect. Although for them it isn't as important to get it right.
GNU Octave uses a built-in copy of Notepad++ as its editor, so that is no extra help. both are published under the GPL, so you could in principle dig through the source code, but that is above my pay grade..

Iniciar sesión para comentar.

Categorías

Más información sobre Test Scripts en Help Center y File Exchange.

Productos


Versión

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by