Hi All,
I am new to PowerShell. I was learning about web scraping I have created below script to fetch data from flipkart and save it to excel.
$excel = New-Object -ComObject excel.application $excel.Application.Visible = $false $excel.DisplayAlerts = $false#dont prompt the user $book=$excel.Workbooks.Add() $sheet1 = $book.Worksheets.Item(1) $sheet1.name = 'FLIPKART' $sheet1.Activate() | Out-Null $sheet1.Cells.Item(1,1)="Name" $sheet1.Cells.Item(1,2)="Rating" $sheet1.Cells.Item(1,3)="RAM/ROM" $sheet1.Cells.Item(1,4)="Display" $sheet1.Cells.Item(1,5)="Camera" $sheet1.Cells.Item(1,6)="Battery" $sheet1.Cells.Item(1,7)="Processor" $sheet1.Cells.Item(1,8)="Warranty" $sheet1.Cells.Item(1,9)="Prize" $rowCounter=2; [int]$pages=Read-Host "Enter no of pages" 1..$pages |ForEach-Object { $page=Invoke-WebRequest -Uri "https://www.flipkart.com/search?q=mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_7_na_na_na&as-pos=1&as-type=RECENT&suggestionId=mobiles%7CMobiles&requestId=192f78df-047c-41ee-a82b-b82d74eb9811&as-searchtext=mobiles&page=$_" $products=$page.ParsedHtml.getElementsByTagName("div")|Where-Object {$_.className -eq "_1-2Iqu row"} for($i=0;$i -le $products.Count;$i++) { if($products[$i] -ne $null) { $sheet1.Cells.Item($rowCounter,1)=$products[$i].childNodes[0].childNodes[0].innerHTML $sheet1.Cells.Item($rowCounter,2)=$products[$i].childNodes[0].childNodes[1].childNodes[0].innerText $sheet1.Cells.Item($rowCounter,3)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[0].innerText $sheet1.Cells.Item($rowCounter,4)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[1].innerText $sheet1.Cells.Item($rowCounter,5)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[2].innerText $sheet1.Cells.Item($rowCounter,6)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[3].innerText $sheet1.Cells.Item($rowCounter,7)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[4].innerText $sheet1.Cells.Item($rowCounter,8)=$products[$i].childNodes[0].childNodes[2].childNodes[0].childNodes[5].innerText $sheet1.Cells.Item($rowCounter,9)=$products[$i].childNodes[1].childNodes[0].childNodes[0].childNodes[0].innerHTML $rowCounter++; } } } $path="C:\Users\$env:USERNAME\Desktop\FLIPKART_mobiles.xlsx" $result=Test-path -path $path if($result) {Remove-Item $path} $book.SaveAs($path) Write-Host "File saved as "$path $excel.quit()
I am facing 2 issues
1st issue : whenever I start new PowerShell console window the script works fine for first 1 or 2 attempt after that it shows error like below:
PS C:\Users\lenovo\Desktop\Ramakant\powershell> .\temp.ps1 Enter no of pages: 1 Exception from HRESULT: 0x800A138A At C:\Users\lenovo\Desktop\Ramakant\powershell\temp.ps1:77 char:5 + $products=$page.ParsedHtml.getElementsByTagName("div")|Where-Obje ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : OperationStopped: (:) [], COMException + FullyQualifiedErrorId : System.Runtime.InteropServices.COMException Cannot index into a null array. At C:\Users\lenovo\Desktop\Ramakant\powershell\temp.ps1:80 char:12 + if($products[$i] -ne $null) + ~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : NullArray
2nd issue : Every product has Name, Rating, Prize but other information may not available How to check if childNodes is present or not. it gives me error like this if any product has not any childNodes but script able to add all other available data
Enter no of pages: 3 Invalid argument. At C:\Users\lenovo\Desktop\Ramakant\powershell\temp.ps1:88 char:13 + $sheet1.Cells.Item($rowCounter,7)=$products[$i].childNode ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : OperationStopped: (:) [], ArgumentException + FullyQualifiedErrorId : System.ArgumentException
Thanks