Finding strings with common character

7 visualizaciones (últimos 30 días)
UWM
UWM el 22 de Dic. de 2023
Comentada: UWM el 22 de Dic. de 2023
I would like to find in a text file all unique strings with common first character, e.g. "G" (unique i.e. without repetition: if any, the same, string occurs several tims I need to specify/print it only once.
Any help would be appreciated.
  1 comentario
madhan ravi
madhan ravi el 22 de Dic. de 2023
Editada: madhan ravi el 22 de Dic. de 2023
Give an example or attach your text file and show the expected result.

Iniciar sesión para comentar.

Respuesta aceptada

Hassaan
Hassaan el 22 de Dic. de 2023
Editada: Hassaan el 22 de Dic. de 2023
You can use a regular expression to separate the strings and then filter out the unique ones that start with 'G'.
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Use regular expression to separate strings that start with 'G'
pattern = ['\' commonChar '\w*'];
allMatches = regexp(fileContent, pattern, 'match');
% Find unique strings
uniqueStrings = unique(allMatches);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
Output:
Unique strings starting with the character G:
G123
G123Gabc
G321Yo
Gabc
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
  2 comentarios
Hassaan
Hassaan el 22 de Dic. de 2023
One of the many approaches without using regexp:
% The character to search for
searchChar = 'G';
% Specify the file name
filename = 'code.txt'; % Replace with your text file name
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the entire file content as a single string
fileContent = fscanf(fileID, '%c');
fclose(fileID); % Close the file after reading
% Remove newlines and carriage returns
fileContent = strrep(fileContent, newline, '');
fileContent = strrep(fileContent, char(13), ''); % Carriage return
% Split the text into individual words assuming 'G' is the delimiter
words = strsplit(fileContent, searchChar);
% Reattach 'G' to the start of each non-empty word
words = words(~cellfun('isempty', words));
words = strcat(searchChar, words);
% Find unique words that start with 'G'
uniqueWords = unique(words);
% Print the unique strings
disp(['Unique strings starting with the character ' searchChar ':']);
disp(uniqueWords);
% Print the unique strings
disp(['Unique strings starting with the character ' commonChar ':']);
for i = 1:length(uniqueWords)
disp(uniqueWords{i});
end
This approach will filter the words that start with the searchChar and remove any empty entries that result from the strsplit. Then, it finds the unique words and prints them out. Make sure to adjust the filename to the actual file you're reading from.
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
G123GabcYo123G321Yo
Output
Unique strings starting with the character G:
G123
G321Yo
Gabc
GabcYo123
Gabcabcde
------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
UWM
UWM el 22 de Dic. de 2023
Thank you very much for help. Works perfectly.

Iniciar sesión para comentar.

Más respuestas (3)

Steven Lord
Steven Lord el 22 de Dic. de 2023
Read the data into MATLAB, split it into separate words if necessary, then use startsWith to determine which words start with your desired character.
L = readlines('bench.dat');
oneLine = L(1) % Just operate on the first line
oneLine = "MATLAB(R) Benchmark Data."
s = split(oneLine)
s = 3×1 string array
"MATLAB(R)" "Benchmark" "Data."
startsWithB = startsWith(s, "B")
startsWithB = 3×1 logical array
0 1 0
wordStartingWithB = s(startsWithB)
wordStartingWithB = "Benchmark"
The unique function likely will be useful to you as well.

Hassaan
Hassaan el 22 de Dic. de 2023
Editada: Hassaan el 22 de Dic. de 2023
To achieve this in MATLAB, you would typically read the text file into a string array or cell array, then use string manipulation functions to find and list the unique strings. Here's a step-by-step guide with code snippets:
Read the Text File: Load the contents of the text file into MATLAB.
filename = 'yourfile.txt'; % Replace with your text file name
fileID = fopen(filename, 'r');
data = textscan(fileID, '%s');
fclose(fileID);
extractedStrings = data{1};
Filter Strings by First Character: Find strings that start with the specified character.
commonChar = 'G'; % Replace with the common character you're looking for
startsWithG = strncmp(extractedStrings, commonChar, 1);
filteredStrings = extractedStrings(startsWithG);
Find Unique Strings: Get the unique strings from the filtered list.
uniqueStrings = unique(filteredStrings);
Print Unique Strings: Display or print the unique strings.
disp(uniqueStrings);
On MATLAB, you can run this script after replacing 'yourfile.txt' with the actual path to your text file and commonChar with the character you're interested in. This will print all unique strings that start with that character, displaying each string only once.
Full Code:
% Specify the file name and the common character
filename = 'yourfile.txt'; % Replace with your text file name
commonChar = 'G'; % Replace with the common character you're looking for
% Open the text file for reading
fileID = fopen(filename, 'r');
if fileID == -1
error('File not found or permission denied');
end
% Read the content of the file into a cell array of strings
data = textscan(fileID, '%s');
fclose(fileID); % Close the file after reading
extractedStrings = data{1}; % Extract the strings from the cell array
% Filter strings by the first character
startsWithCommonChar = strncmp(extractedStrings, commonChar, 1);
% Get the unique strings that start with the specified character
filteredStrings = extractedStrings(startsWithCommonChar);
uniqueStrings = unique(filteredStrings);
% Print the unique strings
disp('Unique strings starting with the specified character:');
disp(uniqueStrings);
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
{'G123'}
{'Gabc'}
For instance, if you need the output as a simple list without the curly braces and single quotes, you can loop through the cell array and print each string:
disp('Unique strings starting with the character G:');
for i = 1:length(uniqueStrings)
disp(uniqueStrings{i});
end
Input file content:
Gabc
abcde
G123
Output:
Unique strings starting with the specified character:
G123
Gabc
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
  4 comentarios
Dyuman Joshi
Dyuman Joshi el 22 de Dic. de 2023
You've only updated for the 2nd point I raised.
Say the input is -
G123Gabc
Yo123G321Yo
What should be the output then?
Hassaan
Hassaan el 22 de Dic. de 2023
Editada: Hassaan el 22 de Dic. de 2023
Input file content:
Gabc
abcde
G123
G123Gabc
G123Gabc
G123Gabc
Yo123G321Yo
Output:
Unique strings starting with the character G:
G123
G123Gabc
G321Yo
Gabc
Provided the new code snippet below as a new answer.

Iniciar sesión para comentar.


Paul
Paul el 22 de Dic. de 2023
type Gfile.txt
Gabc abc abc Gabc Gdef Gdef Gabc GGG Gxyz
% assuming strings to return are space delimited
text = split(string(fileread('Gfile.txt')));
unique(text(startsWith(text,"G")))
ans = 4×1 string array
"GGG" "Gabc" "Gdef" "Gxyz"

Categorías

Más información sobre Entering Commands en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by