Author Topic: A little python help please  (Read 4261 times)

Re: A little python help please
« Reply #25 on: 18 March, 2018, 10:35:21 pm »
> There seem to be 9million different ways you could possibly achieve any given goal
That's a feature.  :P

I'll +1 the use of Beautiful Soup, unless your goal is to learn how to use regexes.

Re: A little python help please
« Reply #26 on: 19 March, 2018, 08:12:22 am »
A couple of points. Regular expressions match "longest leftmost", so if there are multiple sets of marker pairs you will need to handle that differently to the case where there is only one. I'm not sure about your "character class" in the middle, do you really not want to match when there's a digit between your markers? If you want to match everything then say so (.+)
I'm not sure what you mean. That expression successfully matches on strings that contain integer characters. The \w matches on alphanumerics.

Oops - yes you're right, I didn't realise that \w (word characters) includes digits and underscore. I have learnt something (and I should have taken my own advice to use a regex checker!)

What I was trying to say is, do you really want to fail to match your docstart/docend when there are any other characters not in your list, like a comma, question mark, single quote, etc?
Quote from: tiermat
that's not science, it's semantics.

Re: A little python help please
« Reply #27 on: 19 March, 2018, 08:29:39 am »
A couple of points. Regular expressions match "longest leftmost", so if there are multiple sets of marker pairs you will need to handle that differently to the case where there is only one. I'm not sure about your "character class" in the middle, do you really not want to match when there's a digit between your markers? If you want to match everything then say so (.+)
I'm not sure what you mean. That expression successfully matches on strings that contain integer characters. The \w matches on alphanumerics.

Oops - yes you're right, I didn't realise that \w (word characters) includes digits and underscore. I have learnt something (and I should have taken my own advice to use a regex checker!)

What I was trying to say is, do you really want to fail to match your docstart/docend when there are any other characters not in your list, like a comma, question mark, single quote, etc?
There shouldn't be.
The bit between the docstart/docend is a list of files, generated using glob.glob("*.*")
<i>Marmite slave</i>

Re: A little python help please
« Reply #28 on: 19 March, 2018, 09:39:03 am »
It depends where the file names came from.
If there's been scope for people typing them in, there could be any characters, including non-ascii (66/99 quotes, em-dashes etc).

Re: A little python help please
« Reply #29 on: 19 March, 2018, 10:25:17 am »
If there is any of that crap in these filenames, I don't want them listing!

These are files for distribution in a release package, so the filenames are in a controlled format.

This pattern turned out to work. The previous one didn't work on Linux

Code: [Select]
pat = re.compile(r"<!--docstart -->[\n\w\s<>.\"/_=-]*<!--docend -->$", re.IGNORECASE|re.M)
<i>Marmite slave</i>

Re: A little python help please
« Reply #30 on: 19 March, 2018, 12:26:20 pm »
People can only go on what you've given them and with little context (examples, etc) they're going to question some of the decisions made, i.e. the character class:-

Code: [Select]
pat = re.compile(r"<!--docstart -->[\n\w\s<>.\"/_=-]*<!--docend -->$", re.IGNORECASE|re.M)
as most people would just expect to do:-

Code: [Select]
pat = re.compile(r"<!--docstart -->.+<!--docend -->$", re.IGNORECASE|re.M)
It all depends on information they don't have, i.e. are any lines likely to contain extra docstart or docend tags, are some lines to be rejected because of invalid/unwanted data, etc.

Without that context a character class such as [\n\w\s<>.\"/_=-] is a red flag to me, especially with the \n in there (before the regex was in multi-line mode). Another red flag is the lack of a start of string anchor given the presence of an end of string anchor - both or neither in general (but, again, I don't know the full context).

If it's just a set of lines that need converting and the list will be checked afterwards then there's little point over-engineering a one off task. If it works, move on. If it's something that will be run again and again (as part of a release) then I'd expect something a lot more defensively minded.
"Yes please" said Squirrel "biscuits are our favourite things."

Re: A little python help please
« Reply #31 on: 19 March, 2018, 12:54:05 pm »
A python programmer here had a look and suggested using .*

It didn't match, to his surprise. We tried a few versions of that. Only the pattern I've put up matches every time.
<i>Marmite slave</i>

vorsprung

  • Opposites Attract
    • Audaxing
Re: A little python help please
« Reply #32 on: 24 March, 2018, 09:29:56 pm »
A python programmer here had a look and suggested using .*

It didn't match, to his surprise. We tried a few versions of that. Only the pattern I've put up matches every time.

unless you are in re.S mode . doesn't match \n, tell your programmer that