Author Topic: Algorithm question - even splitting  (Read 704 times)

Algorithm question - even splitting
« on: 13 March, 2021, 10:37:39 am »
OK suggestions for a good way to design this algorithm:
Suppose I have a set of 350 "items" {i1...i350}.
Each item is a certain weight, from say 1 to 40g. But they average somewhere between that, say 10g, so the total weight of them all is approx 3.5kg.
The weight of each item is known in advance and doesn't change.
But some might be 1g, some are 20g. The distribution of weights throughout the items is essentially arbitrary/random - so you don't get 'trends' of weights increasing then dropping off as you scan forward through the items.

I want to split the 350 items into say 10 sub-groups such that each subgroup is approx 350g - but doesn't have to be exact.
But I want the subgroups to be as evenly balanced as possible, by weight.

The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.
And the items obviously can't be further subdivided.
So I could say make a cut at item 6 so in the first group I have items {i1...i5}, in the second group I have items {i6... , etc.
What's more, they are in "ring" fashion, so they loop round - so one group could be items {i348, i349, i350, i1, i2} - but it might make things easier to assume that one split is always going to be between item i350 and item i1.


Clues? I can do a sort of brute-force approach and churn through a load of combinations and remember the "balanced-ness" of each potential splitting option but just wondered if anybody knows a more high-brow approach.

Re: Algorithm question - even splitting
« Reply #1 on: 13 March, 2021, 10:55:04 am »
Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.


Re: Algorithm question - even splitting
« Reply #3 on: 13 March, 2021, 11:13:53 am »
Sorry ignore that - did not see your “items must stay in order”

Re: Algorithm question - even splitting
« Reply #4 on: 13 March, 2021, 11:40:40 am »
My first thought is worry about the biggest items first.

Suppose there are indeed n items and you want them in p piles of weight W and the biggest item is weight b

If the original data is a[n]
I would generate s[n] where s is the sum of the first i elements in a, and modulo (W-b) and look for the p smallest values.

... but that might not work when you actually try it!

Re: Algorithm question - even splitting
« Reply #5 on: 13 March, 2021, 12:11:18 pm »
The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.
That is a significant restriction and a significant simplification of the problem.

You can add up all the items' weights, divide by 10, and then just work through the list and cut off as close to the average as possible.

For instance if the total is 3500 g, each group should be 350 g. For each new item, see if adding it takes the total closer to 350 g or further away. If adding an item would take you further from 350g, don't do it, cut off the group and start again.

The first items in the group will obviously move the total closer to 350 g. After a load of items, if you get to a total of 340 g, that's 10 g off target. If the next item is 20 g or more, that would make the total 10 g or more away from 350 g, so cut off the group. If then next item is 19 g or less, then that will be closer to 340 g and so that should be added to the group.

I think that there is a danger that the last one may be over size or under size.
Quote from: Kim
Paging Diver300.  Diver300 to the GSM Trimphone, please...

Re: Algorithm question - even splitting
« Reply #6 on: 13 March, 2021, 03:02:19 pm »
Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?

For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g? And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?

Re: Algorithm question - even splitting
« Reply #7 on: 13 March, 2021, 03:38:10 pm »
Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation.   That will affect two slice sizes.  Then try slice with next biggest deviation that hasn’t already been adjusted.  Once all slices have been adjusted you stop.  All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

Re: Algorithm question - even splitting
« Reply #8 on: 13 March, 2021, 06:36:10 pm »
Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.

No, it doesn't necessarily have to be the absolute best, but just reasonably even - one where the total weight of the groups are within a reasonable proportion, say 10% of each other, would be good.

(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation.   That will affect two slice sizes.  Then try slice with next biggest deviation that hasn’t already been adjusted.  Once all slices have been adjusted you stop.  All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

That sounds good.
What I keep thinking of though is I have to look at the situation as a whole.
So if I have cuts after say A, B, C, D, E and F, where they are numbered items from 1-350. If I pick on B to start with, set it randomly and adjust it so that A->B and B->C have very low SD, then I might be setting myself up for a situation where E is going to fall bang smack in the middle of a massive item, and it's going to have to go either side of it - meaning either D->E or E->F has high SD.

Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?
OK, so that would be that the size of the largest group is as small as possible.
So yes in terms of it pointing to an algorithm, thinking about it like that does give a good metric as to how "good" a particular solution is.


For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g?

No, the latter situation is much better, as the largest group is only 400g, rather than 800g.

Quote
And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?
That's better than one with an 800g group, but not as good as one where half weigh 300g and half 400g.



Re: Algorithm question - even splitting
« Reply #9 on: 13 March, 2021, 06:39:25 pm »
The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.
That is a significant restriction and a significant simplification of the problem.

You can add up all the items' weights, divide by 10, and then just work through the list and cut off as close to the average as possible.

For instance if the total is 3500 g, each group should be 350 g. For each new item, see if adding it takes the total closer to 350 g or further away. If adding an item would take you further from 350g, don't do it, cut off the group and start again.

The first items in the group will obviously move the total closer to 350 g. After a load of items, if you get to a total of 340 g, that's 10 g off target. If the next item is 20 g or more, that would make the total 10 g or more away from 350 g, so cut off the group. If then next item is 19 g or less, then that will be closer to 340 g and so that should be added to the group.

I think that there is a danger that the last one may be over size or under size.

Interesting, yes that gives another good starting point, thanks.

Re: Algorithm question - even splitting
« Reply #10 on: 13 March, 2021, 06:51:14 pm »
Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.

No, it doesn't necessarily have to be the absolute best, but just reasonably even - one where the total weight of the groups are within a reasonable proportion, say 10% of each other, would be good.

(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation.   That will affect two slice sizes.  Then try slice with next biggest deviation that hasn’t already been adjusted.  Once all slices have been adjusted you stop.  All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

That sounds good.
What I keep thinking of though is I have to look at the situation as a whole.
So if I have cuts after say A, B, C, D, E and F, where they are numbered items from 1-350. If I pick on B to start with, set it randomly and adjust it so that A->B and B->C have very low SD, then I might be setting myself up for a situation where E is going to fall bang smack in the middle of a massive item, and it's going to have to go either side of it - meaning either D->E or E->F has high SD.

Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?
OK, so that would be that the size of the largest group is as small as possible.
So yes in terms of it pointing to an algorithm, thinking about it like that does give a good metric as to how "good" a particular solution is.


For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g?

No, the latter situation is much better, as the largest group is only 400g, rather than 800g.

Quote
And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?
That's better than one with an 800g group, but not as good as one where half weigh 300g and half 400g.

This is where picking a random start position works. Run it a few times and you’ll get past that.  You more you run it the more optimal a solution will be discovered.

Re: Algorithm question - even splitting
« Reply #11 on: 13 March, 2021, 06:52:42 pm »
(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

You start by choosing some arbitrary boundaries and give it a score. You then randomly vary the boundaries by some amount and see if it improves the score. If it does, keep it. Repeat until bored.

The annealing part is that you start by randomly varying the boundaries by a lot and then with each successive round you reduce the amount they can be varied.

The idea is that you try to find a ballpark solution with the early rounds and zero in on the local maximums with the later rounds.

I've never found a use it for it, but AIUI it produces results almost as good as brute forcing with orders of magnitude less computation.

Re: Algorithm question - even splitting
« Reply #12 on: 13 March, 2021, 06:59:42 pm »
This is like advent, in lent!

Feanor

  • It's mostly downhill from here.
Re: Algorithm question - even splitting
« Reply #13 on: 13 March, 2021, 07:01:41 pm »
Depends how scalable it needs to be.
for 350 items into 10 bins, I would not hesitate to brute-force it.

My approach would be similar to that outlined by Diver300.
I'd maintain an array [10,350] to hold the results.
For each Starting Point [0..350], I'd store the resulting 10 bin weights so we can do stats on it at the end.

Get the total weight, divide by 10 to get the ideal bin weight.

Choose a starting point.

Add items sequentially till you go over the nominal bin weight.
Then, we have to make a decision.
Include the item that pushed us over the limit, for an overweight bin, or exclude it for an under-weight bin?
So perhaps go with what gives us the closest.

Move to the next bin, and repeat.

Once you've got all 10 bins, increment the starting point and do it all again.

As Diver33 noted, cumulative errors will all end up in the last bin.
In an 'ideal' dataset, the under/over decisions and magnitudes will average out so the cumulative error added to the final bin should be small.
But these are small datasets, and there's not enough data for things to average out 'long term', so it's likely the last bin will be significantly over/under.

My proposed solution to this would be:
Divide the total by 10, then fill the first bin to this value +/- one item per the initial idea, but then going forward to the second bin...
Subtract all assigned items from the total to get the remainder, then divide that by 9, and fill the next bin to this value etc etc.
Continue to calculate the remainder, then divide it amongst the remaining bins.
This way, we wash out the errors over the course of the calculations.

At the end, once we have iterated over all 350 possible starting points and the array is full, do stats to calculate the mean and SD of the resulting 10 bins, and choose your favourite.



Re: Algorithm question - even splitting
« Reply #14 on: 23 March, 2021, 12:15:00 pm »
The actual data follows (there are actually 360 items not 350)

The best that my brute force approach has found yet is to make splits before items 81, 112, 183, 204, 239 and 297.
81 <= Item < 112 is the largest group at 31547360 which is only 0.64% above the mean (thus lowest possible value of the heaviest group) of 31345243.3333333.

This is more than good enough for the real purposes - interesting academically as to whether it's possible to prove that there exists no better solution.
That 0.64% sounds low but it is still over 200,000 "weight units"... which a lot of the items are lighter than. So it's not obviously impossible.... but could be.

Code: [Select]
Item      Weight
0 1132
1 1032
2 348
3 1820
4 4056
5 1288
6 1292
7 1424
8 3556
9 3192
10 1008
11 416
12 44
13 896
14 1188
15 664
16 660
17 1584
18 1120
19 1024
20 6744
21 8800
22 15872
23 9988
24 11640
25 2200
26 480
27 1916
28 8364
29 3700
30 56960
31 2396
32 8600
33 1728
34 3368
35 924
36 868
37 688
38 900
39 1524
40 2016
41 1524
42 1512
43 1924
44 5832
45 3788
46 3652
47 4708
48 3044
49 2176
50 2724
51 5704
52 7348
53 10352
54 14404
55 78568
56 331168
57 883472
58 478868
59 312204
60 320944
61 372392
62 504524
63 369844
64 270056
65 292020
66 219480
67 372960
68 497080
69 269288
70 229012
71 260820
72 307960
73 504684
74 511076
75 592580
76 555856
77 522320
78 674164
79 1091756
80 1323440
81 1328924
82 1352300
83 1118820
84 930940
85 811060
86 881984
87 784212
88 625204
89 883012
90 956988
91 963188
92 964892
93 1037880
94 950936
95 1238212
96 1102848
97 1288724
98 1299400
99 1324176
100 1274024
101 1029972
102 1130212
103 1263272
104 1220560
105 1042612
106 1120600
107 879384
108 1112196
109 801556
110 387852
111 441420
112 329004
113 444944
114 415596
115 449988
116 372900
117 204640
118 311444
119 303512
120 184188
121 457460
122 235464
123 204116
124 254184
125 357040
126 315196
127 360700
128 536272
129 439764
130 802852
131 773928
132 768372
133 901272
134 476872
135 560284
136 652340
137 683916
138 543720
139 639232
140 501248
141 524720
142 426368
143 327092
144 492160
145 114032
146 12
147 564
148 1632
149 12
150 12
151 11052
152 6056
153 12
154 13160
155 5724
156 10748
157 8052
158 32800
159 8084
160 5188
161 7580
162 133916
163 280032
164 199568
165 73372
166 153664
167 128616
168 100460
169 122920
170 425508
171 1151856
172 933348
173 791808
174 762816
175 875588
176 1126372
177 1347180
178 1877456
179 1816560
180 1150096
181 1155896
182 1408832
183 1320388
184 1546296
185 1741364
186 1761356
187 2198132
188 2303600
189 2246852
190 2152184
191 1680408
192 1793600
193 1868812
194 1437292
195 1432648
196 1041160
197 820836
198 1045432
199 879084
200 949684
201 979204
202 987016
203 1057920
204 939516
205 1104596
206 1027668
207 1484200
208 1543024
209 1700360
210 2463932
211 2307188
212 1763264
213 1480124
214 1491436
215 1730368
216 1138448
217 1148156
218 864980
219 996964
220 488236
221 393840
222 396656
223 601688
224 816032
225 538976
226 515960
227 483028
228 489772
229 405884
230 402968
231 526296
232 249140
233 155468
234 239948
235 374224
236 211648
237 206400
238 170820
239 182876
240 145760
241 113800
242 64396
243 78316
244 130876
245 135128
246 171940
247 300440
248 228140
249 334160
250 236412
251 278600
252 418648
253 724812
254 780616
255 857924
256 1129380
257 1354996
258 1046576
259 888264
260 914164
261 439672
262 231076
263 268620
264 207468
265 400996
266 245540
267 174964
268 360504
269 214828
270 229152
271 197416
272 125640
273 87768
274 197564
275 331368
276 340552
277 239764
278 470260
279 663372
280 1193908
281 870764
282 838108
283 890148
284 788852
285 1144576
286 1705632
287 1142672
288 914532
289 673800
290 1033688
291 688868
292 860152
293 818508
294 680048
295 720644
296 636904
297 499008
298 409748
299 701120
300 1160232
301 768988
302 321644
303 302440
304 390964
305 380432
306 467196
307 534496
308 276128
309 272528
310 927812
311 553488
312 491736
313 578332
314 452140
315 890760
316 974664
317 757172
318 867588
319 1759988
320 1282620
321 528368
322 204020
323 182888
324 244244
325 235596
326 105868
327 93132
328 65104
329 157176
330 157572
331 180108
332 140672
333 142844
334 1544
335 3412
336 1948
337 1972
338 8136
339 3484
340 3016
341 1744
342 900
343 812
344 4648
345 6684
346 13700
347 8496
348 17160
349 14944
350 20624
351 15084
352 52712
353 28568
354 140728
355 81220
356 52976
357 31528
358 18216
359 7980