Hi
we have installed Hadoop on two Linux (Ubuntu) machines (2 Datanode / 1 Namenode). Now, we want to access the data from a third computer where our Matlab R2014b is installed on a Windows operating system.
We have two questions:
1. How should we specify the Environment variables (HADOOP_PREFIX) on our Windows machine? 2. Do we need to install Hadoop on our Windows machine?
Thanks for your support.

2 comentarios

Siddharth Sundar
Siddharth Sundar el 13 de Oct. de 2014
The error suggests that datastore hasn't been able to read the folder that contains the customers files. My suggestion for the first step is to check the permissions in HDFS. HDFS is a filesystem that is part of Hadoop, which has posix-like permissions. This folder will be owned by user 'hadoop' and it is possible that permissions are set such that other users cannot access it.
What username are you running MATLAB as? If it is not 'hadoop', then do the following (in a Linux terminal window):
/home/hadoop/hadoop-1.2.1/bin/hadoop fs -ls -l /user/hadoop
If this fails, or if it returns something like:
drwx------ - hadoop supergroup ... /user/hadoop/airline
Then you needs to correct the permissions in your filesystem.
Does this work for you?
Ludwig Drees
Ludwig Drees el 16 de Oct. de 2014
Hi and thanks for your answer.
We checked the permissions by using an internet browser and by using the same ip-adress and port number we have used for the datastore command. For the file it looks like that:
Permission Owner Group Size Replication Block Size Name
-rw-r--r-- hduser supergroup 134.62 MB 3 128 MB data.csv
So everything seems to be fine, right? I have also created a new user on our windows machine, that is called hduser, but still it does not work.

Iniciar sesión para comentar.

 Respuesta aceptada

Javensius Sembiring
Javensius Sembiring el 21 de Oct. de 2014

0 votos

Hi Kalsi,
Thanks for your feedback. Ludwig and I are currently working in this project. The problem was that the configuration in core-site.xml which contain namenode address (fs.default.name) still refers to local IP address. On the other hand, Matlab requires a correct IP address which directly links to the location of hdfs system. So, by changing the IP fs.default.name to public IP address, the Matlab is now able to connect to hdfs storage system.
The Matlab-hadoop configuration we are developing now consist of three computers which connected each other using private network. All these three computers use Ubuntu OS in which hadoop is installed. One of these computers has two network cards, one for local connection and the other for public connection.
The public network card is used by the other computer client to access to this Hadoop cluster. The problem is when we change the df.default.name (namenode address) to public IP address, the hadoop can not start the other two data nodes since the other two data nodes refers to namenode local IP address. I know that this is not Matlab related problem, but do you know how to configure it correctly ?
Thanks in advance,

2 comentarios

Aaditya Kalsi
Aaditya Kalsi el 22 de Oct. de 2014
I'm not an expert here, but this post may be useful.
Javensius Sembiring
Javensius Sembiring el 23 de Oct. de 2014
Hi,
Thanks for the link.

Iniciar sesión para comentar.

Más respuestas (2)

Aaditya Kalsi
Aaditya Kalsi el 15 de Oct. de 2014

0 votos

You do need to install Hadoop on your Windows machine and provide that installation path to MATLAB on the same machine through the HADOOP_PREFIX environment variable.
To specify the environment variable on your Windows machine try:
setenv('HADOOP_PREFIX', 'C:\path\to\hadoop_installation')
ds = datastore('hdfs://host/path/to/file.txt', ...)

3 comentarios

Ludwig Drees
Ludwig Drees el 16 de Oct. de 2014
Editada: Ludwig Drees el 16 de Oct. de 2014
Hi and thanks for your input.
We basically tried two things:
1) Installed hadoop on our windows machine with the same version (V2.5.1) as on the remote Linux computer. But we also still get the error: Error using datastore (line 70)
Failure occured while trying to resolve path 'hdfs://remote-server:9000/folder_name/file_name.csv'.
2) We shared the installation folder of the Hadoop from the remote linux machine. But still it does not work.
Sometimes we also get the following error:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.security.authentication.util.KerberosName).
log4j:WARN Please initialize the log4j system properly.
Thanks in advance.
Aaditya Kalsi
Aaditya Kalsi el 16 de Oct. de 2014
Could you provide the configuration details? Is the host and port correct and is the path known to exist?
It might also help to ensure that the server name and port are exactly the same as the fs.default.name in your Hadoop configuration file.
If youre not in the same network, you may have to fully qualify the hostname.
Hope this helps.
Ludwig Drees
Ludwig Drees el 22 de Oct. de 2014
My collegue Javensius has provided additional information of our configuration (see answer below).
Thanks.

Iniciar sesión para comentar.

yuan xin
yuan xin el 28 de Sept. de 2016

0 votos

The question is how to solve the problem at last.

Categorías

Más información sobre Startup and Shutdown en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 10 de Oct. de 2014

Respondida:

el 28 de Sept. de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by