HDFS UI detailed description

Posted on 2019-02-09 | Edited on 2020-08-01 |

HDFS UIOpen HDFS UI by browser, the URL is {hadoop_master_ip:50070}. This article aims at explaining everything in the HDFS Overview UI in detail, oth ...

Basic understanding of debian packaging

Posted on 2019-02-07 | Edited on 2020-08-02 |

BackgroundIt never becomes easy to setup/install/upgrade environments. This article aims at describing how to use/create debian packages to help setup ...

hadoop cluster incidents

Posted on 2019-01-25 | Edited on 2020-08-01 |

PurposeThe operation of Hadoop cluster is not easy. There are many parameters and there is no perfect configuration that fit all situations. Monitorin ...

Very basic understanding of CAP theorem

Posted on 2019-01-19 | Edited on 2020-08-01 |

Wiki descriptionAccording to the wiki: CAP theorem explains that it is impossible for a distributed data store to simultaneously provide more than two ...

hive GenericUDF and DeferredJavaObject analysis

Posted on 2019-01-09 | Edited on 2020-08-01 |

BackgroundThis article aims at discussing how hive generic User-defined function(GenericUDF) works. In the java doc, it says GenericUDF can do short-c ...

create git hooks in mac

Posted on 2018-12-15 | Edited on 2020-08-01 | In git |

This article explains how to create git hooks in mac, and how to use the customized git hook chains. Preparation for creating hook chain(This part ref ...

hadoop local mode and distributed mode

Posted on 2018-12-09 | Edited on 2020-08-01 | In big data |

Whether a job runs in local mode or distributed mode is decided by mapreduce.framework.name. In local mode, the mapper and reducer will run locally in ...

hive configuration understanding

Posted on 2018-11-19 | Edited on 2020-08-01 | In big data |

This article aims at introducing what are the manually configured settings that override the default during using hive. environmentThis article is bas ...

hive scratch directory

Posted on 2018-11-17 | Edited on 2020-08-01 | In big data |

This article aims at explaining hive scratch directory. Scratch directory usageHive scratch directory is a temporary working space for storing the pla ...

How to fast fail hive jobs

Posted on 2018-11-15 | Edited on 2020-08-04 | In big data |

backgroundWhen running hive jobs in hadoop clusters on mapreduce, we always set the limitation of how much local and hdfs disk a job can use at most. ...

Wang Yan

20 posts

2 categories

6 tags