In order to do kmeans clustering, you need a 2D array of information about characteristics for each location you are clustering. Each location being clustered needs to have its own row in the array.
In classic examples such as the Fisher Iris dataset, the information reflects different physical characteristics such as the width of the petals. The Fisher Iris dataset does not include any information about where geographically any particular sample was collected -- so generally speaking it is not always necessary to include position information.
When it comes to images, it is common that the only information you have for a pixel is the row number, column number, and either grayscale or R G B components. In some cases such as determining the dominent coloring, the position within the image is irrelevant (the same colors would be dominent if you were to sort the pixels somehow.) In such a case you would just include the color information with no position information.
In a case such as yours, if the task were to determine potholes filled with water instead of potholes then it could help a lot if you had a multispectral image with an infrared channel.
But you haven't defined "pothole". What is a "pothole" for this purpose? If there were a single isolated wet pixel, would that constitute a "pothole" by itself? Or is there a minimum physical size of adjacent pixels to determine a pothole? If there were a long narrow crack in the road, could that be considered a pothole, or is there a minimum length and width to be considered a pothole ? If there is a dry dent in the road, should that be considered a pothole?
If geography does not matter, if you can determine potholes on a pixel-by-pixel basis without reference to adjacent pixels, then you will need to construct an information array that includes, for every pixel, the x and y coordinates of the pixel, as well as any intensity information you have available for the pixel -- it might help to use RGB instead of grayscale. For example,
[idx, centers] = kmeans(data, NUMBER_OF_CLUSTERS);
Frankly, the results should be expected to be crud. Chances are high that you want to determine "pothole" by some kind of region information, such as by searching for objects with particular aspect ratios.