High I/O Java process consistently gets signal 11 SIGSEGV in a JavaThread when run in a Docker container

问题: Has anyone been able to consistently replicate SIGSEGVs on the JRE using different hardware and different JRE versions? Note (potentially a big note): I am running the proc...

问题:

Has anyone been able to consistently replicate SIGSEGVs on the JRE using different hardware and different JRE versions? Note (potentially a big note): I am running the process in a Docker container deployed on Kubernetes.

Sample error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fea64dd9d01, pid=21, tid=0x00007fe8dfbfb700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_191-b12) (build 1.8.0_191-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 8706 C2 com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextFieldName()Ljava/lang/String; (493 bytes) @ 0x00007fea64dd9d01 [0x00007fea64dd9b60+0x1a1]

I'm currently managing a high I/O process that has many threads doing I/O and serialization: downloading CSVs and JSONs, reading CSVs, writing JSONs into CSVs, and loading CSVs into MySQL. I do this thousands of times during the application's run cycle. I use nothing but commonly-used libraries (Jackson, jOOQ) and "normal" code: specifically, I did not write custom code that uses the JNI.

Without fail, the JVM will SIGSEGV during each run cycle. It seems to SIGSERV in various parts of the code base, but never on a GC thread or any other well-known threads. The "problematic frame" is always compiled code.

Testing specs:

  • Multiple different hardware instances in AWS.
  • Tested using Java 8 191 and 181. Ubuntu 16.04.
  • This process is running in a container (Docker) and deployed on Kubernetes.
  • Docker version: 17.03.2-ce

Here's the full log: https://gist.github.com/navkast/9c95f56ce818d76276684fa5bb9a6864


回答1:

From the full log:

siginfo: si_signo: 11 (SIGSEGV), si_code: 0 (SI_USER)

This means a kill() was issued. This is not a JVM issue. Something is killing the process deliberately. Probably due to running out of memory.


回答2:

Based on your comment, this is likely a case where your container limits are lower than your heap space + space needed for GC.

Some insights on how to run the JVM in a container here.

You didn't post any pod specs but you can also take a look a setting limits on your Kubernetes pods.


回答3:

A big hint is here

 Memory: 4k page, physical 33554432k(1020k free), swap 0k(0k free)

Out of 32 GB, only 1 MB is free at the time of the crash. Most likely the process was killed as the system has run out of memory. I suggest:

  • reducing the heap size significantly. e.g. 2 - 8 GB
  • increasing the available memory. e.g. 4 - 16 GB
  • adding some swap space. e.g. 8 - 32 GB, this doesn't fix the problem but will handle full memory a little more gracefully.
  • 发表于 2019-01-05 23:57
  • 阅读 ( 260 )
  • 分类:网络文章

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除