Finding content in a file without having to read every line

When programing with Winbatch (winbatch.com) I could read the contents of a file into a Binary buffer. This way I would not need to read each line. I know I could use the select-string command, but if I have multiple sections in the file and each section contains the string to be extracted, then select-string would not work as well. In such a case it would have been easy going to the start of section, end of section, extracting the whole section out and then extracting the line in the section, instead of reading the file a line at a time to extract the line as well as keeping track of when you enter and exit a section.

So could this be written in Powershell (This is Winbatch code):

File_HostConnect  = "c:\temp\HostConnect.txt"
fs_HostConnect = FileSize(File_HostConnect)
binbuf_HostConnect = BinaryAlloc(fs_HostConnect)
BinaryRead( binbuf_HostConnect, File_HostConnect)
Offset_HostConnect = 0 

while @true
      Str_Begin = BinaryIndexNc( binbuf_HostConnect, Offset_HostConnect, "Some string", @FWDSCAN)
      If Str_Begin == 0 then Break
      Str_End = BinaryIndexNc( binbuf_HostConnect, Str_Begin, "end of string", @FWDSCAN) 
      If Str_End == 0 then Break
      Offset_HostConnect = Str_End
      Str_Full = strtrim(BinaryPeekStr(binbuf_HostConnect, Str_Begin, Str_End - Str_Begin))
      Break
Endwhile
BinaryFree(binbuf_HostConnect)

 

but if I have multiple sections in the file and each section contains the string to be extracted, then select-string would not work as well.

Can you explain what you mean by it wouldn’t work as well? You can find all matches with Select-String and it will give you an object with the line and line number where it was found, the matching term, etc.

Yes you can find with Select-String, but you do not know what the item relates too. Thats why you you either need flags… or something like binarybuffer (as above)

What it relates to in regards to what? If you’re searching file A for text “blah”, the matching text relates to file A. Can you please explain what you mean in detail?

[quote quote=279501]What it relates to in regards to what? If you’re searching file A for text “blah”, the matching text relates to file A. Can you please explain what you mean in detail?
[/quote]

I am already able to extract the contents with powershell by way of using flags as I mentioned in my earlier post. But my question is: is is there way to avoid reading each line.

For example lets say you have to extract “some text” from an HTML page and the line represented is “DIV some-text DV”. But there are tons of DIV elements on the page , so how would you know you have to extract the text from the specific DIV. For this you will need to find a unique identifier prior to this line, which could very well be a higher line. Thats why the select-string will not work in this case. This is because when select-string returns a value, you will not know which one it references as there could be several of them on the page.

I have done this by using flags. So when I encounter the unique identifier I set my flag = 1 and when the “some text” is found on maybe the next line or several lines below I set the flag back to zero. Although this would mean reading each line, setting and then resetting flags. That brings me back to my original question: can i read the whole page and go right to the word/section I want to access, instead of having to read each line of the file. please re-read my original question.

Your original question is lacking clarity just like your responses. What criteria (flag) are you wanting to use to differentiate sections? Your analogy of DIV doesn’t help as you still haven’t shown how you differentiate the sections. Based on the proprietary and code in your original post, it would appear “end of string” is the marker? If that’s the case simply state that because you can keep track of sections using a marker if the marker is known. I’d say based on the lack of responses others are not able to discern what it is you actually are trying to extract, how to differentiate sections, etc. Good luck, hopefully someone else can read your mind.

Its evident you are not trying the understand the question or the responses. Maybe by posting suggestions you get points for it. If you don’t know please give someone else the opportunity to suggest a solution, instead of asking questions to no end when everything has been specified in the first post to start with !!!

Also if you keep ridiculing people because of your inadequacy to understand simple technology, it will only reflect badly on the forum. Be nice to people !!!

Post #279570 was reported as containing Inappropriate content, but I see no evidence of that. It appears to be mostly constructive criticism. I have cleared the report.

In general, asking questions is to be expected and encouraged, as none of us can see into another’s work environment and we must offer advice while effectively blind. If someone says that they do not understand what you are asking, you should take them at their word.

Please do not abuse the Report feature.

As I read this, here is my take. If you read a file into a binary buffer, then the ONLY way to identify where in that buffer the result is found is to set a flag. I totally get Doug’s point in that with Select-String, you dont need flags as PS has in essence has taken care of that for you by identifying where in the file the strings reside. I agree with Doug, using Select-String, flags should not be needed.

You also seem to have an issue with reading each line of the file with Select-String, yet with your “binary buffer” approach, to find all instances you are doing the very same thing.

Just my $.02

You could potentially stream text or bytes of a line without reading the whole line. However, something must instruct the code to stop reading (like it finds what it’s looking for early). The same can be said for reading lines as all lines don’t have to be read if target is found early. Most of the pattern matching commands have the ability to span multiple lines and apply lookbehind or lookahead matching. Combining that with other control flow logic could implement tracking flags for any associations you want to create.

If I am working with a trivial version of your analogy, you basically need dynamic matching where one lookup determines another. Below is a multi-line string with tags (surrounded by []). You ultimately want the text after [tag4] and [tag8] but the only identifier you have is middle.

$data = @'
[tag1]
title=my file
author=admin
[tag2]
top=tag3
middle=tag4
bottom=tag5
[tag3]
this is the top table
[tag4]
this is the middle table
[tag5]
this is the bottom table
[tag6]
middle=tag8
[tag7]
tag 7 data
[tag8]
this is tag 8
'@

$data |
    select-string -pattern '(?sm)^middle=(.*?)$.*\[\1\]$\r?\n(.*?)$' -allmatches |% {
        $_.matches |% {
            $_.Groups[2].Value
        }
    }


Output:

this is the middle table
this is tag 8