Josh-CO Dev

Solving the worlds problems one line of code at a time.

Leave a comment

Powershell – Scrubbing Data From a Webpage

One function that I have found a lot of use for is to write a script to go out to a website and scrub out data. With traditional programming languages, this was always a little tricky without calling a webservice or something like that to process the request. Powershell makes this surprisingly easy, especially if you are familiar with HTML and know what to look for. I am attaching a commented script below that should give you the basics of how to scrub a site for data. In this case, I am hitting the NIST NVE and passing in a CVE as a parameter to get more information.

#This function is responsible for connecting to the NIST NVE and retrieving the CVE data from the database.
#$Link - the url to the cve entry 
function GetCVEDataFromNIST($Link)
	#assign our parameter to a local variable
	$url = $Link
	#create a new webrequest to go out and fetch the html contents
	$result = Invoke-WebRequest $url
	#$html = $result.ParsedHtml.getElementsByTagName("span") | Where "classname" -match "^label" | Select -ExpandProperty InnerText
	#parse the html contents to find 

tags with a class of row. The inner text will contain the CVE Summary $Summary = $result.ParsedHtml.getElementsByTagName("p") | Where "classname" -match "^row" | Select -ExpandProperty InnerText #parse the html contents to find

tags with a class of row. The inner text will contain the published date and cve score $Published = $result.ParsedHtml.getElementsByTagName("div") | Where "classname" -match "^row" | Select -ExpandProperty InnerText #Start the concatenation of data. This will take the summary and add a new line $AllCVEData = ("{0} `n" -f $Summary) #parse through the published and cve score and separate them out into two lines then concatenate into allcvedata foreach($element in $Published) { #write-host $element $AllCVEData += "`n {0}" -f $element } #write-host $AllCVEData #return the summary, published date, and cve score return $AllCVEData }

A couple of notes about this. First, if you work for a corporation, you probably need to go through a proxy. See my other article for that. Second, you have to know the html of the site. In the code above, we are looking for


elements that have a specific class and this tells us where our data is.

In this script, I am defining link in another function and passing it in to this one. For the curious minded, this is how I am building the url.

$Link = ("{0}&search_type=all&cves=on" -f $CVE)

Again, you have to research the site. In this case, the url takes a parameter in the query string of the cve. You can simply sub this out with your own input and get custom results back. For the evil-minded out there, please don’t try to fuzz the inputs on NIST’s site, it likely won’t end well for you.