본문 바로가기
BigData 기술/HBase,Phoenix

HBase Start Process (HBase 2.2) - Region Assign

by 잇서니 2021. 9. 8.
반응형

 

 

 

HBase Cluster 시작 과정을 정리하면서

  • Region 할당이 어떤 과정으로 이루어지는지 알아보고자 한다.
  • HBase Restart시 간헐적으로 발생하는 비정상 현상 원인을 파악하고자 한다.

 

HBase 2.2+ 변경사항

  • HBase 2.2+ uses a new Procedure form assiging/unassigning/moving Regions.
  • It does not process HBase 2.1 and 2.0’s Unassign/Assign Procedure types.

 

HBase Master 로그를 확인하며 HBase Cluster 시작 과정을 파악하였다.

 

 

1. 프로세스 시작 및 Zookeeper 연결


 

 

2. WAL 처리


  • Recover lease on hdfs (MasterProcWALs/728)
  • Rolled new Procedure Store WAL (729)
  • Read hdfs MasterProcWALs (728)
  • Rolled new Procedure Store WAL (730)
  • Remove all state logs with ID less than 729, since no active procedures
  • Archiving hdfs:///hbase/MasterProcWALs/728, 729 to oldWALs
  • Loaded WALProcedureStore

 

 

 

 

3. Procedure(Sever Crash Procedure) 스케쥴링


#procedure가 리전서버 개수만큼 (pid = 1 ~ 6)
2021-09-03 13:25:19,055 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw06.dakao.io,16020,1630633919867 on pay-hubble-gsm01.dakao.io,16000,16306431148967
2021-09-03 13:25:19,215 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=1 for pay-hubble-gsw06.dakao.io,16020,1630633919867 (carryingMeta=true) pay-hubble-gsw06.dakao.io,16020,1630633919867/CRASHED/regionCount=1/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1c801c1[Write locks = 1, Read locks = 0], oldState=ONLINE.


2021-09-03 13:25:19,216 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw02.dakao.io,16020,1630633919945 on pay-hubble-gsm01.dakao.io,16000,1630643114896
2021-09-03 13:25:19,319 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=2 for pay-hubble-gsw02.dakao.io,16020,1630633919945 (carryingMeta=false) pay-hubble-gsw02.dakao.io,16020,1630633919945/CRASHED/regionCount=0/lock=java.util.concurrent.locks.ReentrantReadWriteLock@461d0c15[Write locks = 1, Read locks = 0], oldState=ONLINE.


2021-09-03 13:25:19,319 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw04.dakao.io,16020,1630633919894 on pay-hubble-gsm01.dakao.io,16000,1630643114896
2021-09-03 13:25:19,422 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=3 for pay-hubble-gsw04.dakao.io,16020,1630633919894 (carryingMeta=false) pay-hubble-gsw04.dakao.io,16020,1630633919894/CRASHED/regionCount=0/lock=java.util.concurrent.locks.ReentrantReadWriteLock@1dd7e32b[Write locks = 1, Read locks = 0], oldState=ONLINE.


2021-09-03 13:25:19,422 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw01.dakao.io,16020,1630633919889 on pay-hubble-gsm01.dakao.io,16000,1630643114896
2021-09-03 13:25:19,525 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=4 for pay-hubble-gsw01.dakao.io,16020,1630633919889 (carryingMeta=false) pay-hubble-gsw01.dakao.io,16020,1630633919889/CRASHED/regionCount=0/lock=java.util.concurrent.locks.ReentrantReadWriteLock@5c502f59[Write locks = 1, Read locks = 0], oldState=ONLINE.


2021-09-03 13:25:19,525 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw05.dakao.io,16020,1630633919880 on pay-hubble-gsm01.dakao.io,16000,1630643114896
2021-09-03 13:25:19,627 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=5 for pay-hubble-gsw05.dakao.io,16020,1630633919880 (carryingMeta=false) pay-hubble-gsw05.dakao.io,16020,1630633919880/CRASHED/regionCount=0/lock=java.util.concurrent.locks.ReentrantReadWriteLock@7e407baa[Write locks = 1, Read locks = 0], oldState=ONLINE.


2021-09-03 13:25:19,627 INFO org.apache.hadoop.hbase.master.ServerManager: Processing expiration of pay-hubble-gsw03.dakao.io,16020,1630633919893 on pay-hubble-gsm01.dakao.io,16000,1630643114896
2021-09-03 13:25:19,730 INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager: Scheduled SCP pid=6 for pay-hubble-gsw03.dakao.io,16020,1630633919893 (carryingMeta=false) pay-hubble-gsw03.dakao.io,16020,1630633919893/CRASHED/regionCount=0/lock=java.util.concurrent.locks.ReentrantReadWriteLock@5093bdd3[Write locks = 1, Read locks = 0], oldState=ONLINE.

 

이전 id 값을 가진 Regionserver를 만료하는 과정이다.

 

이 때 각 Regionserver에 해당하는 Procedure(Server Crash Procedure)가 스케쥴링 된다.

hubble 클러스터의 경우 Regionserver가 6대이므로 6개의 Procedure(pid=1 ~ 6)가 스케쥴링 된다.

 

(참고)

Regionserver는 각각 id 값을 가지는데 구동될 때마다 새로운 값으로 변경된다.

Regionserver id는 HBase Master UI 에서도 확인할 수 있다.

 

 

 

 

4. hbase:meta 리전 Assign 시작


2021-09-03 13:25:22,853 INFO org.apache.hadoop.hbase.master.HMaster: hbase:meta {1588230740 state=OPEN, ts=1630643119046, server=pay-hubble-gsw06.dakao.io,16020,1630633919867}


# procedure (pid=1) -> server_crash procedure (기존 meta-region-server를 crash함)
2021-09-03 13:25:22,881 INFO org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Start pid=1, state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure server=pay-hubble-gsw06.dakao.io,16020,1630633919867, splitWal=true, meta=true


#procedure (pid=7, ppid=1) -> transit region state  procedure (hbase:meta 리전)
2021-09-03 13:25:22,920 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=7, ppid=1, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN}]
2021-09-03 13:25:22,933 INFO org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler: Took xlock for pid=7, ppid=1, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2021-09-03 13:25:22,936 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=7, ppid=1, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false

 

hbase:meta 테이블의 Region Assign이 시작되는 과정이다.

 

기존 meta-regionserver를 Crash하는 Procedure가 시작되고 Resion을 Assign하는 subprocedure가 구동되는 형태이다.

기존 meta-regionserver를 Crash하는 Procedure는 위에서 스케쥴링 된 Procedure 이다.

 

(참고)

HBase를 시작하면 hbase:meta 테이블에 대한 Region이 제일 먼저 Assign 된다. hbase:meta 테이블의 Region은 1개의 Regionserver가 갖고 있다. 이 Regionserver를 meta-regionserver라고 한다.

 

5. 새로운 Regionserver 등록


 

021-09-03 13:25:25,822 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw04.dakao.io,16020,1630643113864
2021-09-03 13:25:25,823 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw05.dakao.io,16020,1630643113848
2021-09-03 13:25:25,824 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw01.dakao.io,16020,1630643113867
2021-09-03 13:25:25,824 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw03.dakao.io,16020,1630643113871
2021-09-03 13:25:25,828 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw06.dakao.io,16020,1630643113849
2021-09-03 13:25:25,828 INFO org.apache.hadoop.hbase.master.ServerManager: Registering regionserver=pay-hubble-gsw02.dakao.io,16020,1630643113891
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw02.dakao.io,16020,1630643113891]
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw06.dakao.io,16020,1630643113849]
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw05.dakao.io,16020,1630643113848]
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw04.dakao.io,16020,1630643113864]
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw01.dakao.io,16020,1630643113867]
2021-09-03 13:25:25,846 INFO org.apache.hadoop.hbase.master.RegionServerTracker: RegionServer ephemeral node created, adding [pay-hubble-gsw03.dakao.io,16020,1630643113871]

 

새로운 id 값을 가진 Regionserver를 등록하는 과정이다.

 

새로운 Regionserver 정보는 zookeeper에 저장된다.

znode : /hbase/rs

 

 

6.  새로운 meta Regionserver 를 Zookeeper에 세팅


2021-09-03 13:25:25,897 INFO org.apache.hadoop.hbase.zookeeper.MetaTableLocator: Setting hbase:meta (replicaId=0) location in ZooKeeper as pay-hubble-gsw05.dakao.io,16020,1630643113848

 

새로운 id 값을 가진 meta Regionserver를 Zookeeper에 세팅하는 과정이다.

 

meta Regionserver 정보는 Zookeeper에 저장된다.

znode : /hbase/meta-regionserver

 

기존에는 hbase:meta 리전이 gsw06 서버에 위치했지만, 재시작하여 gsw05에 위치한다는 것을 확인할 수 있다.

이 과정이 제대로 이루어지지 않을 시 hbase:meta 의 리전 할당이 제대로 되지 않을 것으로 예상된다.

 

 

7.  hbase:meta 리전 Assign 완료


# procedure(pid=8, ppid=7) Finished
2021-09-03 13:25:25,903 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=8, ppid=7, state=RUNNABLE; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
2021-09-03 13:25:27,225 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished subprocedure pid=8, resume processing parent pid=7, ppid=1, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN
2021-09-03 13:25:27,226 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=8, ppid=7, state=SUCCESS; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure in 1.3170sec


# procedure (pid=7) Finished
2021-09-03 13:25:27,331 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=7, ppid=1, state=SUCCESS; TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN in 4.3050sec

 

hbase:meta 리전 Assign이 완료된다.

 

결론적으로 hbase:meta 리전을 Assign하는 Procedure는 총 2개가 구동된다. (pid=7, pid=8)

  • OpenRegion
  • TransitRegionState

 

hbase:meta 리전이 정상적으로 Assign 되지 않을 시, hbase master가 정상적으로 startup이 되지 않는다.

그러면 모든 리전 정보를 알 수 없기 때문에 HBase를 사용할 수 없는 것과 마찬가지다.

맨 처음에 언급한 이슈와 같은 현상이 발생하는 것이다.

 

 

8.  hbase:meta 정보 로딩 및 업데이트


# hbase:meta 기존 정보 로딩
2021-09-03 13:25:27,697 INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: Load hbase:meta entry region=ec3328c4416e71907739aea3b3e2bf7d, regionState=OPEN, lastHost=pay-hubble-gsw01.dakao.io,16020,1630633919889, regionLocation=pay-hubble-gsw01.dakao.io,16020,1630633919889, openSeqNum=28
2021-09-03 13:25:27,698 INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: Load hbase:meta entry region=9d4dc0e5129c6180d8f4144b997504c8, regionState=OPEN, lastHost=pay-hubble-gsw02.dakao.io,16020,1630633919945, regionLocation=pay-hubble-gsw02.dakao.io,16020,1630633919945, openSeqNum=28
...


# hbase:meta 정보 업데이트
2021-09-03 13:25:27,854 INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=51 updating hbase:meta row=e1d5109f5f74442273ba83392c1cead1, regionState=OPENING, regionLocation=pay-hubble-gsw01.dakao.io,16020,1630643113867
2021-09-03 13:25:27,854 INFO org.apache.hadoop.hbase.master.assignment.RegionStateStore: pid=59 updating hbase:meta row=d08229ec63dfa9520e83a0fad7320791, regionState=OPENING, regionLocation=pay-hubble-gsw06.dakao.io,16020,1630643113849
...

 

hbase:meta 리전 Assign이 완료되면

hbase:meta 기존 정보를 로딩하고 새로운 정보로 업데이트 한다. (재시작하면 Region들의 위치가 변경되기 때문에 hbase:meta 업데이트가 필요함)

 

9.  다른 모든 테이블 리전 Assign


2021-09-03 13:25:28,025 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=11, ppid=2, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=SYSTEM:LOG, region=82b03ccb78a1f255ce29ed9323c10675, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false
2021-09-03 13:25:28,025 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=17, ppid=2, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=SWAY:DR_CHECK, region=020be7b1c11a930cd892cbdf2b5ef631, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false
2021-09-03 13:25:28,025 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=16, ppid=2, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=SWAY:DR_CHECK, region=a9102f0fbd452dffb7c1ad4243b70f3b, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false
2021-09-03 13:25:28,025 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=14, ppid=2, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=SWAY:DR_CHECK, region=e0a7788350be3489d901e0d2879e3d25, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false
2021-09-03 13:25:28,025 INFO org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure: Starting pid=10, ppid=2, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, locked=true; TransitRegionStateProcedure table=SYSTEM:LOG, region=529511783be228f4193e05f0a230689e, ASSIGN; rit=OPEN, location=null; forceNewPlan=true, retain=false


...

 

hbase:meta 테이블 외에 다른 테이블들의 리전 Assign을 시작한다.

 

리전을 Assign하는 subprocedure들이 구동되는 형태이다.

 

 

10. Procedure(Server Crash Procedure) 완료


 

# meta-region-server Crash Finished (meta = true)
2021-09-03 13:25:32,196 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=1, state=SUCCESS; ServerCrashProcedure server=pay-hubble-gsw06.dakao.io,16020,1630633919867, splitWal=true, meta=true in 13.0310sec


# region server Crash Finished
2021-09-03 13:25:34,236 INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=2, state=SUCCESS; ServerCrashProcedure server=pay-hubble-gsw02.dakao.io,16020,1630633919945, splitWal=true, meta=false in 15.0180sec
...

Region을 새로운 id를 가진 Regionserver에 Assign 이 모두 완료되면

맨 처음 시작됐던 Procedure(Server Crash Procedure)가 비로소 완료된다.

즉, 재시작 전의 Regionserver에 할당된 리전들을 재할당하는 작업이 완료된 것이다.

 

(참고)

Server Crash Procedure (SCP) spawns the WAL splitting tasks and then the reassign of all regions that were hosted on the crashed server as subprocedures.

https://hbase.apache.org/book.html#amv2

반응형

댓글