Author Topic: Parsing text into Excel challenge  (Read 1274 times)

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Parsing text into Excel challenge
« on: 13 January, 2021, 09:54:49 pm »
[ ... or "Baptism of our Lord Jesus Christ Festival of Code"]
Declaration: this is a work problem. I do not work for charidee or the NHS. You could spend your time more virtuously answering anyone else's question.

I have a PDF of some requirements from a Gov agency, and we want it in 2 columns of an Excel sheet. So the output will look like

14.5.6.7.2.1  <new cell> The thing must not explode in public.
14.5.6.7.2.1.1  <new cell> It must understand Welsh
14.5.6.7.2.1.2  <new cell> It shall not be pink or yellow.


Here is sometext in the input PDF's format:

Code: [Select]
6.2   A person who contravenes the requirement in regulation 3 commits an offence.

6.2.1    A person who obstructs, without reasonable [F1excuse], any person carrying out a function under these Regulations commits an offence.

6.2.2     A person who, without reasonable excuse, contravenes a direction given under regulation 5(2) or regulation 5(5) commits an offence.

6.3 An offence under this regulation is punishable on summary conviction by a fine.

6.4 Section 24 of the Police and Criminal Evidence Act 1984 F2 applies in relation to an offence under this regulation as if the reasons in subsection (5) of that section included—

   6.4.1  to maintain public health;
   6.4.2  to maintain public order.     

6.5 it's an offence to have any fun

We *ONLY* care about the first number clause ID: the rest is - to us - free text.

Go! (and thanks for reading x)
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #1 on: 13 January, 2021, 10:00:36 pm »
Use FIND to find whatever the space character is after the number block.

Then use MID(cell ref,1,number found above-1). That will give you the number part
Then use MID(cell ref,number found above+1,200). That will return the text part.
Rust never sleeps

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Re: Parsing text into Excel challenge
« Reply #2 on: 13 January, 2021, 10:04:31 pm »
Use FIND to find whatever the space character is after the number block.

Then use MID(cell ref,1,number found above-1). That will give you the number part
Then use MID(cell ref,number found above+1,200). That will return the text part.
Thanks!

I did know this was possible - what's the easiest way to automate it? Can I just embed the FIND and the MID into 1 formula?
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #3 on: 13 January, 2021, 10:07:15 pm »
You can. Hold on.
Rust never sleeps

Re: Parsing text into Excel challenge
« Reply #4 on: 13 January, 2021, 10:15:11 pm »
If data in cell A1 is

1.1.1.2   Text to parse

Cell B1 formula is
=MID(A1,1,FIND(" ",A1,1)-1)

Cell C1 formula is
=MID(A1,FIND(" ",A1,1),200)

That is assuming the space after the number is a blank space.

C1 will however return leading spaces if there is more than one space in the original text, so use TRIM

=TRIM(MID(A1,FIND(" ",A1,1),200))

Sorry, edited.
Rust never sleeps

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Re: Parsing text into Excel challenge
« Reply #5 on: 13 January, 2021, 10:35:27 pm »
Mr Hatler, you are a proper gent.

x

p.s. I won't be doing this until at least 0800, so you didn't have to rush ;)
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #6 on: 13 January, 2021, 11:54:43 pm »
Nae worries.  If I'd waited until tomorrow morning then : -

a)  I wouldn't have been awake at 8am and
b)  Work would have started and I wouldn't have surfaced until gone 6pm.
Rust never sleeps

Re: Parsing text into Excel challenge
« Reply #7 on: 13 January, 2021, 11:56:40 pm »
It might be that the space after the number block is a tab, in which case you would have to copy that 'space' and paste that into the empty quote marks in the formulae (I think that would work).

Failing that, use Word to do the find and replace thing first.
Rust never sleeps

Pingu

  • Put away those fiery biscuits!
  • Mrs Pingu's domestique
    • the Igloo
Re: Parsing text into Excel challenge
« Reply #8 on: 14 January, 2021, 09:59:41 am »
I think the CLEAN function will deal with tabs.

Re: Parsing text into Excel challenge
« Reply #9 on: 14 January, 2021, 10:25:42 am »
If the space character is a Tab then it may well paste into separate cells in Excel anyway. But good call if it's not a space and it doesn't split out.
Rust never sleeps

Mr Larrington

  • A bit ov a lyv wyr by slof standirds
  • Custard Wallah
    • Mr Larrington's Automatic Diary
Re: Parsing text into Excel challenge
« Reply #10 on: 14 January, 2021, 10:47:57 am »
I think the CLEAN function will deal with tabs.

(Asks Mr Google)

Oh, I say!  That could be rather useful to this Unit.  Ta!
External Transparent Wall Inspection Operative & Mayor of Mortagne-au-Perche
Satisfying the Bloodlust of the Masses in Peacetime

Re: Parsing text into Excel challenge
« Reply #11 on: 14 January, 2021, 01:17:23 pm »
If Excel 2010 or later there is a data —> text to columns menu option.

You can pick space, tab, semi colon, comma, or any other character as the delimiters.

Jaded

  • The Codfather
  • Formerly known as Jaded
Re: Parsing text into Excel challenge
« Reply #12 on: 14 January, 2021, 01:32:01 pm »
Which is OK if the character you want to slip the text at doesn't occur elsewhere in the text where you don't want to split it. Spaces often occur elsewhere.

Tabs, not so much, and sometimes Excel looks at a Tab and splits the data there anyway, when importing it.
It is simpler than it looks.

Re: Parsing text into Excel challenge
« Reply #13 on: 14 January, 2021, 02:48:59 pm »
If Excel 2010 or later there is a data —> text to columns menu option.

You can pick space, tab, semi colon, comma, or any other character as the delimiters.

That functionality goes back at least to Excel 2000.

Re: Parsing text into Excel challenge
« Reply #14 on: 14 January, 2021, 03:05:05 pm »
Might be simpler to import the pdf into Word to process the text then export to Excel.  All could be done within Execl VBA is required.  Do you have a sample pdf to have a play with.
Clever enough to know I'm not clever enough.