Base64 encode and decode in Python: b64encode, urlsafe, and the bytes rule
Python's `base64` module is correct, complete, and routinely misused. The whole API is `bytes in, bytes out`. The `str` you want at the end is one decode call away.
b64encode and b64decode are bytes-only
base64.b64encode accepts bytes and returns bytes. Pass a str and you get TypeError. The full round-trip from text to Base64 string and back looks like this:
```py import base64
text = "AldeaCode" encoded = base64.b64encode(text.encode("utf-8")).decode("ascii") # "QWxkZWFDb2Rl"
decoded = base64.b64decode(encoded).decode("utf-8") # "AldeaCode" ```
The two encode/decode calls flank the Base64 work. Forgetting them is the most common Python Base64 bug on Stack Overflow, by a wide margin.
urlsafe is what tokens need
Standard Base64 uses + and / in its alphabet. Both are reserved characters in URLs and force percent-encoding the moment you try to use them as a query parameter or path segment.
base64.urlsafe_b64encode swaps + for - and / for _, which is what RFC 4648 ยง5 calls "Base 64 Encoding with URL and Filename Safe Alphabet". JWTs and most modern token formats expect the URL-safe variant, often without padding.
def b64url_no_padding(data: bytes) -> str:
return base64.urlsafe_b64encode(data).rstrip(b"=").decode("ascii")
When decoding back, you need to re-pad to a multiple of 4 with = before urlsafe_b64decode will accept it.
Files and binary data round-trip cleanly
Base64 was designed to ferry arbitrary bytes through 7-bit-only transports (email, JSON, XML attributes). Encoding a PNG into a data URI is one of its canonical uses:
```py with open("logo.png", "rb") as f: data = f.read()
data_uri = "data:image/png;base64," + base64.b64encode(data).decode("ascii") ```
The 33% size overhead is fundamental to the format: 3 raw bytes become 4 ASCII characters. If you are sending Base64 images over the wire to save HTTP requests, measure first. For images larger than a few KB, a normal binary response is almost always faster.
Padding and validation
Base64 strings without padding are technically not RFC 4648 compliant, but the convention is so widespread (JWT, OAuth tokens, JOSE) that you will hit unpadded strings constantly. b64decode accepts a validate=True flag that rejects characters outside the Base64 alphabet, which is what you want for user input.
For unpadded input, pad it back yourself: s + "=" * (-len(s) % 4) rounds up to the next multiple of 4. Cleaner than trying to try/except your way through the library's own binascii.Error.
Working example
pythonimport base64
# Standard Base64 round-trip
text = "AldeaCode ๐ "
encoded = base64.b64encode(text.encode("utf-8")).decode("ascii")
decoded = base64.b64decode(encoded).decode("utf-8")
assert decoded == text
# URL-safe, unpadded (JWT style)
def b64url(data: bytes) -> str:
return base64.urlsafe_b64encode(data).rstrip(b"=").decode("ascii")
def b64url_decode(s: str) -> bytes:
padded = s + "=" * (-len(s) % 4)
return base64.urlsafe_b64decode(padded) Just need the result?
When you just want to inspect a Base64 chunk you grabbed from a JWT or a curl response, opening a Python shell is overkill. Paste it into the browser-based Base64 tool and see the decoded text and bytes immediately, with the URL-safe variant a click away.
Open Base64 Encoder and Decoder โFrequently asked questions
Why does my Base64 string have a trailing newline?
If you used base64.encodebytes (with an s) it adds a newline every 76 chars and a trailing newline at the end, per MIME convention. Use b64encode (without the s) for clean single-line output.
Can I Base64 encode emoji safely?
Yes. Encode the string to UTF-8 first to get bytes, then b64encode. Decoding is the reverse: b64decode and then .decode('utf-8'). Skip the UTF-8 step and you will mangle anything outside ASCII.
Is Base64 secure?
No. Base64 is encoding, not encryption. Anyone reading the string can decode it. Use it for transport, never for hiding secrets, and never as a substitute for a hash or HMAC.