Docker and Invalid Characters
I keep on running into interesting problems while migrating applications into our Kubernetes(k8s) environment.
Recently, a Spring Boot based application started returning invalid characters once it was ported to k8s. Instead of
returning a ό
, it returned with a set of garbage characters.
In my first attempt to debug the issue, I started the application in an IntelliJ IDE, and it returned with a correct response. It was obvious the invalid character problem was related to encoding issue; I wasn’t sure about the layer which was causing it. I have across similar issues related to URL encoding but never in a HTTP response body.
Now, if you think a little more closely about this application’s new environment; the only variables were the k8s networking layer and the operating system due to dockerization. Although a k8s networking is a different beast compared to my application’s old networking tier, it’s normally not a factor that affects encoding. While an application’s OS is a different story.
A program locale plays a significant role in defining an application’s code sets, date and time formatting, monetary
conventions, and decimal formatting. In a Unix based system such as Linux, the default locale codeset is usually
set to UTF-8
, an ASCII compatible 8-bit encoding form of Unicode. It made me realize that the next step in my
investigation should involve comparing the OS locale of the different environments. I checked the
locale of each environment by executing the locale
command from the command line.
Here’s the locale output of my old CentOS KVM environment,
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
To check the docker container locale, I got a shell to a running container of my application by executing the
kubectl exec
command ("kubectl exec -ti myservice-9c8b5ff5f-56gw6 -c myservice-app -n mynamespace sh"
). Here’s the
locale output of my application’s k8s container,
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
sh-4.2#
If you compare the two locale outputs, you will notice that the variables in my old KVM are set to en_US.UTF-8
while
the same variables in the k8s are set to POSIX
locale, also known as C locale. To match the locale of
the k8s with the KVM, I had to set the locale environment variables in my application’s Dockerfile
. Here are
the changes I made to my application’s Dockerfile to set the locale environment,
FROM centos:7
# Install java 11
RUN yum -q -y install java-11-openjdk && \
yum clean all
# Set the locale
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
...
Once the changes were made, the docker image was rebuilt and deployed in the k8s environment. Guess what, the application response was back to normal.