Author Topic: Parsing text into Excel challenge  (Read 434 times)

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Parsing text into Excel challenge
« on: January 13, 2021, 09:54:49 pm »
[ ... or "Baptism of our Lord Jesus Christ Festival of Code"]
Declaration: this is a work problem. I do not work for charidee or the NHS. You could spend your time more virtuously answering anyone else's question.

I have a PDF of some requirements from a Gov agency, and we want it in 2 columns of an Excel sheet. So the output will look like

14.5.6.7.2.1  <new cell> The thing must not explode in public.
14.5.6.7.2.1.1  <new cell> It must understand Welsh
14.5.6.7.2.1.2  <new cell> It shall not be pink or yellow.


Here is sometext in the input PDF's format:

Code: [Select]
6.2   A person who contravenes the requirement in regulation 3 commits an offence.

6.2.1    A person who obstructs, without reasonable [F1excuse], any person carrying out a function under these Regulations commits an offence.

6.2.2     A person who, without reasonable excuse, contravenes a direction given under regulation 5(2) or regulation 5(5) commits an offence.

6.3 An offence under this regulation is punishable on summary conviction by a fine.

6.4 Section 24 of the Police and Criminal Evidence Act 1984 F2 applies in relation to an offence under this regulation as if the reasons in subsection (5) of that section included—

   6.4.1  to maintain public health;
   6.4.2  to maintain public order.     

6.5 it's an offence to have any fun

We *ONLY* care about the first number clause ID: the rest is - to us - free text.

Go! (and thanks for reading x)
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #1 on: January 13, 2021, 10:00:36 pm »
Use FIND to find whatever the space character is after the number block.

Then use MID(cell ref,1,number found above-1). That will give you the number part
Then use MID(cell ref,number found above+1,200). That will return the text part.
Rust never sleeps

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Re: Parsing text into Excel challenge
« Reply #2 on: January 13, 2021, 10:04:31 pm »
Use FIND to find whatever the space character is after the number block.

Then use MID(cell ref,1,number found above-1). That will give you the number part
Then use MID(cell ref,number found above+1,200). That will return the text part.
Thanks!

I did know this was possible - what's the easiest way to automate it? Can I just embed the FIND and the MID into 1 formula?
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #3 on: January 13, 2021, 10:07:15 pm »
You can. Hold on.
Rust never sleeps

Re: Parsing text into Excel challenge
« Reply #4 on: January 13, 2021, 10:15:11 pm »
If data in cell A1 is

1.1.1.2   Text to parse

Cell B1 formula is
=MID(A1,1,FIND(" ",A1,1)-1)

Cell C1 formula is
=MID(A1,FIND(" ",A1,1),200)

That is assuming the space after the number is a blank space.

C1 will however return leading spaces if there is more than one space in the original text, so use TRIM

=TRIM(MID(A1,FIND(" ",A1,1),200))

Sorry, edited.
Rust never sleeps

mattc

  • n.b. have grown beard since photo taken
    • Didcot Audaxes
Re: Parsing text into Excel challenge
« Reply #5 on: January 13, 2021, 10:35:27 pm »
Mr Hatler, you are a proper gent.

x

p.s. I won't be doing this until at least 0800, so you didn't have to rush ;)
Has never ridden RAAM
---------
No.11  Because of the great host of those who dislike the least appearance of "swank " when they travel the roads and lanes. - From Kuklos' 39 Articles

Re: Parsing text into Excel challenge
« Reply #6 on: January 13, 2021, 11:54:43 pm »
Nae worries.  If I'd waited until tomorrow morning then : -

a)  I wouldn't have been awake at 8am and
b)  Work would have started and I wouldn't have surfaced until gone 6pm.
Rust never sleeps

Re: Parsing text into Excel challenge
« Reply #7 on: January 13, 2021, 11:56:40 pm »
It might be that the space after the number block is a tab, in which case you would have to copy that 'space' and paste that into the empty quote marks in the formulae (I think that would work).

Failing that, use Word to do the find and replace thing first.
Rust never sleeps

Pingu

  • Put away those fiery biscuits!
  • Mrs Pingu's domestique
    • the Igloo
Re: Parsing text into Excel challenge
« Reply #8 on: January 14, 2021, 09:59:41 am »
I think the CLEAN function will deal with tabs.

Re: Parsing text into Excel challenge
« Reply #9 on: January 14, 2021, 10:25:42 am »
If the space character is a Tab then it may well paste into separate cells in Excel anyway. But good call if it's not a space and it doesn't split out.
Rust never sleeps

Mr Larrington

  • A bit ov a lyv wyr by slof standirds
  • Custard Wallah
    • Mr Larrington's Automatic Diary
Re: Parsing text into Excel challenge
« Reply #10 on: January 14, 2021, 10:47:57 am »
I think the CLEAN function will deal with tabs.

(Asks Mr Google)

Oh, I say!  That could be rather useful to this Unit.  Ta!
External Transparent Wall Inspection Operative & Mayor of Mortagne-au-Perche
Satisfying the Bloodlust of the Masses in Peacetime

Re: Parsing text into Excel challenge
« Reply #11 on: January 14, 2021, 01:17:23 pm »
If Excel 2010 or later there is a data —> text to columns menu option.

You can pick space, tab, semi colon, comma, or any other character as the delimiters.

Jaded

  • The Codfather
  • Formerly known as Jaded
Re: Parsing text into Excel challenge
« Reply #12 on: January 14, 2021, 01:32:01 pm »
Which is OK if the character you want to slip the text at doesn't occur elsewhere in the text where you don't want to split it. Spaces often occur elsewhere.

Tabs, not so much, and sometimes Excel looks at a Tab and splits the data there anyway, when importing it.
If you don't like your democracy, vote against it.

Re: Parsing text into Excel challenge
« Reply #13 on: January 14, 2021, 02:48:59 pm »
If Excel 2010 or later there is a data —> text to columns menu option.

You can pick space, tab, semi colon, comma, or any other character as the delimiters.

That functionality goes back at least to Excel 2000.

Re: Parsing text into Excel challenge
« Reply #14 on: January 14, 2021, 03:05:05 pm »
Might be simpler to import the pdf into Word to process the text then export to Excel.  All could be done within Execl VBA is required.  Do you have a sample pdf to have a play with.