Tuesday, October 22, 2019

Delete Old log files from HDFS - Older than 10 days

Script: ( Tested ) :


today=`date +'%s'`
hdfs dfs -ls /file/Path/ | grep "^d" | while read line ; do
dir_date=$(echo ${line} | awk '{print $6}')
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ( 24*60*60 ) ))
filePath=$(echo ${line} | awk '{print $8}')

if [ ${difference} -gt 10 ]; then
    hdfs dfs -rm -r $filePath
fi
done



Note: Change file path and It is not skipping the trash.

Tuesday, October 15, 2019

NameNode HA fails over due to connection interruption with JournalNodes

Issue:

Occasionally NameNode HA fails over due to network connection interruption with JournalNodes.

The error message in NameNode log:

2015-11-06 17:39:09,497 FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: starting log segment 2710407 failed for required journal (JournalAndStream(mgr=QJM to [172.xxx.xxx.xxx:8485, 172.xxx.xxx.xxx:8485, 172.xxx.xxx.xxx:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:403)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$3.apply(JournalSet.java:222)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
2015-11-06 17:39:09,500 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2015-11-06 17:39:09,506 INFO namenode.NameNode (StringUtils.java:run(659)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <FQDN>/172.xxx.xxx.xxx
************************************************************

Cause:

Potential network interruption between active NameNode and all JournalNodes.


Solution:

Increase the journal quorum connection timeout value to 60 seconds by adding
following property to HDFS config in the hdfs-site:

Property Name:  dfs.qjournal.write-txns.timeout.ms
Property Value: 60000

While running a Tez job, it fails with 'vertex failure' error

The error below is seen while a Hive query runs in TEZ execution mode:

Logs:


Vertex failed, vertexName=Reducer 34, vertexId=vertex_1424999265634_0222_1_23, diagnostics=[Task failed, taskId=task_1424999265634_0222_1_23_000007, diagnostics=[AttemptID:attempt_1424999265634_0222_1_23_000007_0 Info:Error: java.lang.RuntimeException: java.lang.RuntimeException: Reduce operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:188)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:564)
at java.se
... 6 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: : init not supported
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStreamingEvaluator.init(GenericUDAFStreamingEvaluator.java:70)

at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:160)
... 7 more

CAUSE:

Tez containers are not allocating enough memory to run the query.

SOLUTION:

Set the following configurations values in the Hive query to increase the memory:

tez.am.resource.memory.mb=4096
tez.am.java.opts=-server -Xmx3276m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC

hive.tez.container.size=4096
hive.tez.java.opts=-server -Xmx3276m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC

Note:
- Ensure that the *.opts values are 80% of the *.mbs and there the *mb values can be allocated by the NodeManagers.
- If the issue happens again, please increase the above values by adding the value of the min container size and rerun the query.

Monday, October 14, 2019

How to limit the number of mappers for a MapReduce job via Java code?

The number of mappers launched is based on the input block size.

The input block size is the size of the data block sent to different mappers while it is being read from the HDFS.

So in order to control the number of mappers we have to control the block size.  

Now, there are two properties we can look into:
- mapred.min.split.size
- mapred.max.split.size (size in bytes)  

For example, if we have a 20 GB file, and we want to launch 40 mappers, then we need to set it to 20480 / 40 = 512 MB each.

So for that the code would be:  

conf.set("mapred.min.split.size", "536870912");
conf.set("mapred.max.split.size", "536870912");  

Setting "mapred.map.tasks" is just a directive on how many mapper we want to launch. It doesn't guarantee that many mappers. 



More information on this topic:  http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Hive Architecture

Hive Architecture in One Image