namenode and datanode "Block Pool Used" abnormal growth

Discussion:

tao tony

2018-06-01 05:56:13 UTC

hi ,

I used Apache HAWQ to write data on HDFS-2.7.3,and met a strange problem.

I had totally wirte 300MB data,commit 100 times,each time commit 3MB.But
each node "block pool used" increased by more than 30GB,"block pool
used" in namenode increased 100GB.But when I use "hadoop fs -du -h
/",the space only grow 300MB.And there's no change with block numbers.
If i continually commit small data, "block pool used" will become
greater then 100% and returned no space left.

After about several minutes,the "block pool used" will gradually
decrease to the normal.

I didn't see any logs on namenode and datanode to reclaim the "block
pool used".

Anyone could explain why it happend and how Could I solve this problem.Many thanks!

Tao Jin

Т��ХF�V�7V'67&�&R�R��âvV�W&��V�7V'67&�&T�F��6�R��&pФf�"FF�F��6��G2�

tao tony

2018-06-01 05:59:59 UTC

Permalink

Kihwal Lee

2018-06-01 13:18:49 UTC

Permalink

That's because the files were still open. You get billed for the entire
block until the file is closed (block is finalized).
As an experiment, try reducing "dfs.blocksize" by half.

Kihwal

Post by tao tony
hi ,
I used Apache HAWQ to write data on HDFS-2.7.3,and met a strange problem.
I had totally wirte 300MB data,commit 100 times,each time commit 3MB.But
each node "block pool used" increased by more than 30GB,"block pool
used" in namenode increased 100GB.But when I use "hadoop fs -du -h
/",the space only grow 300MB.And there's no change with block numbers.
If i continually commit small data, "block pool used" will become
greater then 100% and returned no space left.
After about several minutes,the "block pool used" will gradually
decrease to the normal.
I didn't see any logs on namenode and datanode to reclaim the "block
pool used".
Anyone could explain why it happend and how Could I solve this
problem.Many thanks!
Tao Jin