SEC requires all bot requests to include the headers described in SEC bot request policy:
User-Agent: Sample Company Name AdminContact@<sample company domain>.comAccept-Encoding: gzip, deflateHost: www.sec.govI can download one of the resources, "companyfacts.zip", in the target directory https://www.sec.gov/Archives/edgar/daily-index/xbrl/ with header arguments in the curl command.
target_dir="https://www.sec.gov/Archives/edgar/daily-index/xbrl/"resource="$target_dir/companyfacts.zip"curl "$resource" \ -H "User-Agent: xxxx@gmail.com" \ -H "Accept-Encoding:gzip, deflate" \ -H "Host: www.sec.gov" \ -OThe resource "https://www.sec.gov/Archives/edgar/daily-index/xbrl//companyfacts.zip" downloaded successfully. Now I want to list all resources in the target webpage "https://www.sec.gov/Archives/edgar/daily-index/xbrl/":
curl -H "User-Agent: xxxx@gmail.com" \ -H "Accept-Encoding:gzip, deflate" \ -H "Host: www.sec.gov" \ --list-only "$target_dir" \ --output - | zcat The output :
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 215 100 215 0 0 88 0 0:00:02 0:00:02 --:--:-- 88<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message> <RequestId>XQXMWV75A7KEH1V9</RequestId> <HostId>dFP6F0LuD3BWVhEdYePnOmUmSREgcBWCdyfht0uYZ8r3NAhRYWpQ4li6JrZKZXokhau0TwqJDko= </HostId></Error>How can I list all resources in the SEC's specified website directory?