Quantcast
Channel: Active questions tagged https - Stack Overflow
Viewing all articles
Browse latest Browse all 1818

How can list all resources in the SEC's specified website directory?

$
0
0

SEC requires all bot requests to include the headers described in SEC bot request policy:

User-Agent: Sample Company Name AdminContact@<sample company domain>.comAccept-Encoding: gzip, deflateHost: www.sec.gov

I can download one of the resources, "companyfacts.zip", in the target directory https://www.sec.gov/Archives/edgar/daily-index/xbrl/ with header arguments in the curl command.

target_dir="https://www.sec.gov/Archives/edgar/daily-index/xbrl/"resource="$target_dir/companyfacts.zip"curl "$resource"                        \     -H "User-Agent: xxxx@gmail.com"    \     -H "Accept-Encoding:gzip, deflate" \     -H "Host: www.sec.gov"             \      -O

The resource "https://www.sec.gov/Archives/edgar/daily-index/xbrl//companyfacts.zip" downloaded successfully. Now I want to list all resources in the target webpage "https://www.sec.gov/Archives/edgar/daily-index/xbrl/":

curl -H "User-Agent: xxxx@gmail.com"    \     -H "Accept-Encoding:gzip, deflate" \     -H "Host: www.sec.gov"             \     --list-only "$target_dir"          \     --output  - | zcat 

The output :

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed100   215  100   215    0     0     88      0  0:00:02  0:00:02 --:--:--    88<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message> <RequestId>XQXMWV75A7KEH1V9</RequestId>   <HostId>dFP6F0LuD3BWVhEdYePnOmUmSREgcBWCdyfht0uYZ8r3NAhRYWpQ4li6JrZKZXokhau0TwqJDko= </HostId></Error>

How can I list all resources in the SEC's specified website directory?


Viewing all articles
Browse latest Browse all 1818

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>