I'm using a Python script to create a man-in-the-middle (MITM) proxy for intercepting HTTPS traffic. The script captures and logs the requests and responses to a log file. While the headers of the HTTPS requests and responses are readable as plain text, the bodies are logged as byte strings, making them difficult to interpret.
Here’s the relevant part of my script that handles the HTTPS request and response:
def relay_data(self, s_ssl, conn_ssl, buffer_size, url): website_response = [] while True: try: request = conn_ssl.recv(buffer_size) if not request: break s_ssl.sendall(request) except socket.error: pass try: response = s_ssl.recv(buffer_size) if not response: break conn_ssl.sendall(response) if response: website_response.append(response) except socket.error: pass if website_response: log_response_data(website_response, "https") print(f"Request completed (HTTPS) [{url}]")def log_response_data(website_response, protocol): formatted_response, demarcation = "", "____________________________________________________________________________________________________" for response in website_response: try: formatted_response += response.decode('utf-8') except: formatted_response += response if formatted_response: with open(f"{protocol}_log_file", "a") as F: F.write(f"{demarcation}\n{formatted_response}\n{demarcation}\n\n")
The relay_data function collects the HTTPS responses in the website_response list and then attempts to log them using log_response_data. The logging function tries to decode the responses into UTF-8 format. However, since HTTPS response bodies often contain binary data (e.g., images, files, encrypted content), the decoding fails, and the log ends up containing raw byte strings as follows(the below request is truncated):
______________________________________________________________________________________________________________________b'HTTP/1.1 200 OKVary: Accept-EncodingContent-Encoding: brContent-Type: text/css; charset=utf-8Access-Control-Allow-Origin: *Last-Modified: Mon, 01 Jan 2001 08:00:00 GMTExpires: Mon, 07 Jul 2025 16:39:40 GMTCache-Control: public,max-age=31536000,immutablereporting-endpoints: permissions_policy="https://www.xx.facebook.com/ajax/browser_error_reports/"timing-allow-origin: *document-policy: force-load-at-toppermissions-policy: accelerometer=(), attribution-reporting=(), autoplay=(), battery=(self), bluetooth=(), camera=(), ch-device-memory=(), ch-downlink=(), ch-dpr=(), ch-ect=(), ch-rtt=(), ch-save-data=(), ch-ua-arch=(), ch-ua-bitness=(), ch-viewport-height=(), ch-viewport-width=(), ch-width=(), clipboard-read=(), clipboard-write=(), compute-pressure=(), display-capture=(), encrypted-media=(), fullscreen=(self), gamepad=(), geolocation=(), gyroscope=(), hid=(), idle-detection=(), interest-cohort=(), keyboard-map=(), local-fonts=(), magnetometer=(), microphone=(), midi=(), otp-credentials=(), payment=(), picture-in-picture=(), private-state-token-issuance=(), publickey-credentials-get=(), screen-wake-lock=(), serial=(), shared-storage=(), shared-storage-select-url=(), private-state-token-redemption=(), usb=(), usb-unrestricted=(), unload=(self), window-management=(), xr-spatial-tracking=();report-to="permissions_policy"cross-origin-resource-policy: cross-originX-Content-Type-Options: nosniffreport-to: {"max_age":21600,"endpoints":[{"url":"https:\/\/www.xx.facebook.com\/ajax\/browser_error_reports\/"}],"group":"permissions_policy"}content-md5: yBQVMFwk1cIEUVIB4cPFJw==X-FB-Debug: FVUkfTRg3dvvwbDFhD8Xj5Bxk8qMudZ3UR/oe+x9HEIHxg+Wh2aQAJqhqUReqxiHUgi/KHRKaUjPsQ3bnnIrkg==Date: Mon, 08 Jul 2024 11:33:02 GMTX-FB-Connection-Quality: MODERATE; q=0.3, rtt=152, rtx=0, c=13, mss=1368, tbw=2569, tp=-1, tpl=-1, uplat=1, ullat=-1Alt-Svc: h3=":443"; ma=86400Connection: keep-aliveContent-Length: 10118'b'\xe2'b'\x1d\x96\x88\xa2>\x04(B\x86\xb9G\x7fi\xdf\x7f~\xbe,U\xa36\xfb\xc0\xe3\x1b\x1b\xa4j\xa7\x9d\xe9\xbc\xee\xf6\xda\xc9\x1e72\xd8$N\xcc\x11s\x84\x14q\xbd\x99\xf6\xa64\x00\xe4\xdc\x99\xec\xe7\xb2\x95+\"\x00\xca\xf9L\xa9\xf1A\xc26\xef\xd5\xcd\xec\x02U\x0bW\x05\xba\xaa#\xbfs\xb4\x92\xef\xd7\xdd3\xdc}3\xa0\xb0\x86\x12\xb9\xcbc\x1d\x16<\xe3\xe9d\x9cM\x7f\x96\xc8\xd9TY|A\xa8R\x12)HB=\x86\xea\xcb\xdeo\x14\t\"\xf2\xcc\xb6e\xb8l\xf4d\xe2\xf7H\xb0@\x91\x90\xf0\xbbp\xd3\xe8\xe9\xe0\xaf\x050\xbfz\xb6\x94\xdcd\xeb\xadq\x92\x80xx\x17\x86\xb6K\xd7\xa6`\xa11r\x01uU\xc5+Q%\xb3\xf4\xe9\x14\x81\xee*A\x93\xbc\xa3\x8c\x97+\xec\x8e}\xda\xbbg\x8bv\x16\x04?*E\x80v3\x90\xb7?!\xec\x0b}\x87\"\xb0\x9d\x83t\xdb{\xb7\x11!\xe8\xc23c\x91^\xfc\xe1\xc1\xceC\xcc\xa3\x9d\xb4\xc3\x10\xc5-\xb5\xe0\xd5K\xaa\xcf\x03\xd9\xff<\xf0\xe1\xde}\xdat\x04%\x14\xc01L}\x1c`\xef\xc3\xe8b8\xf2\xcf\xc5\x02\t6J/ \x03\x9e9\t\x1e/<R\xb3(\xd8\x12\x12\xd4q\xf6\xbb\'\x00\x16\xd6v6\xbc\x8d\xc5\xc2\x8a%Y\x9a\x83\xa5\x8ad\xfb\xd9(\x16r\xa6t\xc2^\xee\xa51\xde\xd8m\x05\x1d/\x84DA\xc4\xc0Z\xd6\xdb x\xff\x1e\xd6\xcd\x06\xe6\x1c\x8fb\xa1\xc2F\xcc\x90\xf90\xc3c\x15}\xbb\xb5_\xd8j0e\xd6\xd0)}\x81\x9c:\xf30\xac\x8f\x01\xe8\xea}Sx\xc8\x03\xb8\x86X<\xa3*)4\xac\x08\xce\xc2\xde\xa8\x87\x1d5I7\xd7\xdf\xde\xdc\x88\xf7\xdf\x03\xc0<>\x15\xb0\xd6e7\xd2\x98\xb9$\xfd\xdd\xfa\xc9\xbe_\x0f\xde\xech#p\xd9/\x15\x81o\xc1i\x1f\x01\xf9\x99]\x01\x0e\xf7\xd5FC\xec\xb2xt\x11\x88\xa5E\'\xbe\xf4w\x8b\xc0\x83w\xcd\xe9U\x97\xbb\xfb\xdf\xf9\xa9m\x86\x08\xdc\xc2\xdd\xd3\xdb\xee\xaf|\xfc\xcb\xfe\xb2\xfb\xd1\xebp\xdb`\xf7\x8f\xe0\x87K\xdb\xd0\xdf\xf2vg\xf7\xad\x05\x7f{\x06\xbf[\xe1\xb2\xdb\xac\x0fe*\x016\xeen\x11\xbf\xe4v\x9b\x8d\x8d\x1b\xf0e,L\xfe\x9b\x97\xfe\xed7NQ\xcc+\x16\x81\xbf\xfd\x86\x91F1F\xd5+\x96!\xe93?'........______________________________________________________________________________________________________________________
My questions are:
- Why do the HTTPS response bodies appear as bytes in my log file?
- How can I decode/decypt the bytes data in my log file?
- Should I distinguish between different content types and handle them separately? If so, how can I implement this in my script?
Any guidance on improving the readability of my HTTPS response logs would be greatly appreciated!