Author Topic: Algorithm question - even splitting (Read 1652 times)

Ben T · « **on:** 13 March, 2021, 10:37:39 am »

OK suggestions for a good way to design this algorithm:
Suppose I have a set of 350 "items" {i1...i350}.
Each item is a certain weight, from say 1 to 40g. But they average somewhere between that, say 10g, so the total weight of them all is approx 3.5kg.
The weight of each item is known in advance and doesn't change.
But some might be 1g, some are 20g. The distribution of weights throughout the items is essentially arbitrary/random - so you don't get 'trends' of weights increasing then dropping off as you scan forward through the items.

I want to split the 350 items into say 10 sub-groups such that each subgroup is approx 350g - but doesn't have to be exact.
But I want the subgroups to be as evenly balanced as possible, by weight.

The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.
And the items obviously can't be further subdivided.
So I could say make a cut at item 6 so in the first group I have items {i1...i5}, in the second group I have items {i6... , etc.
What's more, they are in "ring" fashion, so they loop round - so one group could be items {i348, i349, i350, i1, i2} - but it might make things easier to assume that one split is always going to be between item i350 and item i1.

Clues? I can do a sort of brute-force approach and churn through a load of combinations and remember the "balanced-ness" of each potential splitting option but just wondered if anybody knows a more high-brow approach.

grams · « **Reply #1 on:** 13 March, 2021, 10:55:04 am »

Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.

Davef · « **Reply #2 on:** 13 March, 2021, 11:09:19 am »

https://developers.google.com/optimization/bin/multiple_knapsack

Davef · « **Reply #3 on:** 13 March, 2021, 11:13:53 am »

Sorry ignore that - did not see your “items must stay in order”

Davef · « **Reply #4 on:** 13 March, 2021, 11:40:40 am »

My first thought is worry about the biggest items first.

Suppose there are indeed n items and you want them in p piles of weight W and the biggest item is weight b

If the original data is a[n]
I would generate s[n] where s is the sum of the first i elements in a, and modulo (W-b) and look for the p smallest values.

... but that might not work when you actually try it!

Diver300 · « **Reply #5 on:** 13 March, 2021, 12:11:18 pm »

Quote from: Ben T on 13 March, 2021, 10:37:39 am

The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.

That is a significant restriction and a significant simplification of the problem.

You can add up all the items' weights, divide by 10, and then just work through the list and cut off as close to the average as possible.

For instance if the total is 3500 g, each group should be 350 g. For each new item, see if adding it takes the total closer to 350 g or further away. If adding an item would take you further from 350g, don't do it, cut off the group and start again.

The first items in the group will obviously move the total closer to 350 g. After a load of items, if you get to a total of 340 g, that's 10 g off target. If the next item is 20 g or more, that would make the total 10 g or more away from 350 g, so cut off the group. If then next item is 19 g or less, then that will be closer to 340 g and so that should be added to the group.

I think that there is a danger that the last one may be over size or under size.

drossall · « **Reply #6 on:** 13 March, 2021, 03:02:19 pm »

Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?

For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g? And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?

Lightning Phil · « **Reply #7 on:** 13 March, 2021, 03:38:10 pm »

Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation. That will affect two slice sizes. Then try slice with next biggest deviation that hasn’t already been adjusted. Once all slices have been adjusted you stop. All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

Ben T · « **Reply #8 on:** 13 March, 2021, 06:36:10 pm »

Quote from: grams on 13 March, 2021, 10:55:04 am

Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.

No, it doesn't necessarily have to be the absolute best, but just reasonably even - one where the total weight of the groups are within a reasonable proportion, say 10% of each other, would be good.

(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

Quote from: Lightning Phil on 13 March, 2021, 03:38:10 pm

Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation. That will affect two slice sizes. Then try slice with next biggest deviation that hasn’t already been adjusted. Once all slices have been adjusted you stop. All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

That sounds good.
What I keep thinking of though is I have to look at the situation as a whole.
So if I have cuts after say A, B, C, D, E and F, where they are numbered items from 1-350. If I pick on B to start with, set it randomly and adjust it so that A->B and B->C have very low SD, then I might be setting myself up for a situation where E is going to fall bang smack in the middle of a massive item, and it's going to have to go either side of it - meaning either D->E or E->F has high SD.

Quote from: drossall on 13 March, 2021, 03:02:19 pm

Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?

OK, so that would be that the size of the largest group is as small as possible.
So yes in terms of it pointing to an algorithm, thinking about it like that does give a good metric as to how "good" a particular solution is.

Quote from: drossall on 13 March, 2021, 03:02:19 pm

For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g?

No, the latter situation is much better, as the largest group is only 400g, rather than 800g.

Quote

And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?

That's better than one with an 800g group, but not as good as one where half weigh 300g and half 400g.

Ben T · « **Reply #9 on:** 13 March, 2021, 06:39:25 pm »

Quote from: Diver300 on 13 March, 2021, 12:11:18 pm

Quote from: Ben T on 13 March, 2021, 10:37:39 am
The items have to remain in order - so it's basically like - I have to make 10 cuts - and it's just a case of where to make them.
That is a significant restriction and a significant simplification of the problem.

You can add up all the items' weights, divide by 10, and then just work through the list and cut off as close to the average as possible.

For instance if the total is 3500 g, each group should be 350 g. For each new item, see if adding it takes the total closer to 350 g or further away. If adding an item would take you further from 350g, don't do it, cut off the group and start again.

The first items in the group will obviously move the total closer to 350 g. After a load of items, if you get to a total of 340 g, that's 10 g off target. If the next item is 20 g or more, that would make the total 10 g or more away from 350 g, so cut off the group. If then next item is 19 g or less, then that will be closer to 340 g and so that should be added to the group.

I think that there is a danger that the last one may be over size or under size.

Interesting, yes that gives another good starting point, thanks.

Lightning Phil · « **Reply #10 on:** 13 March, 2021, 06:51:14 pm »

Quote from: Ben T on 13 March, 2021, 06:36:10 pm

Quote from: grams on 13 March, 2021, 10:55:04 am
Do you have to find the *best* solution or merely a reasonably good one in a reasonable time?

If the latter then simulated annealing (or some lowbrow variation) would do the job.

No, it doesn't necessarily have to be the absolute best, but just reasonably even - one where the total weight of the groups are within a reasonable proportion, say 10% of each other, would be good.

(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

Quote from: Lightning Phil on 13 March, 2021, 03:38:10 pm
Are you measuring evenly divided by calculating standard deviation or something else? What is your criteria?

As to the start of the algorithm. As it’s a ring design you should pick where to start randomly. I’d then just take a slice every 35 items as a starting point. I’d then pick the slice with the biggest deviation from your 350g, then adjust its slice start to try and reduce the standard deviation. That will affect two slice sizes. Then try slice with next biggest deviation that hasn’t already been adjusted. Once all slices have been adjusted you stop. All slice starts should have been adjusted (which may mean leaving it where it was). You can then run it with random starts however many times you want. The version that produces the lowest SD is your solution.

That sounds good.
What I keep thinking of though is I have to look at the situation as a whole.
So if I have cuts after say A, B, C, D, E and F, where they are numbered items from 1-350. If I pick on B to start with, set it randomly and adjust it so that A->B and B->C have very low SD, then I might be setting myself up for a situation where E is going to fall bang smack in the middle of a massive item, and it's going to have to go either side of it - meaning either D->E or E->F has high SD.

Quote from: drossall on 13 March, 2021, 03:02:19 pm
Don't you have to define more precisely first what "As evenly balanced as possible by weight" means? That might point to an algorithm?
OK, so that would be that the size of the largest group is as small as possible.
So yes in terms of it pointing to an algorithm, thinking about it like that does give a good metric as to how "good" a particular solution is.

Quote from: drossall on 13 March, 2021, 03:02:19 pm

For example, because of the rules that you have set out, there might be a solution in which one group weighs 800g but all the other nine weigh 300g each. Is that better than half weighing 300g and other half 400g?

No, the latter situation is much better, as the largest group is only 400g, rather than 800g.

Quote
And a third where one group weighs 5g, one 695g, and all the others are spot on 350g. Which is best? Is there a maximum acceptable variation from the mean for any one group?
That's better than one with an 800g group, but not as good as one where half weigh 300g and half 400g.

This is where picking a random start position works. Run it a few times and you’ll get past that. You more you run it the more optimal a solution will be discovered.

grams · « **Reply #11 on:** 13 March, 2021, 06:52:42 pm »

Quote from: Ben T on 13 March, 2021, 06:36:10 pm

(I'll google "simulated annealing" as I've no idea what it is, unless you have a recommended resource?)

You start by choosing some arbitrary boundaries and give it a score. You then randomly vary the boundaries by some amount and see if it improves the score. If it does, keep it. Repeat until bored.

The annealing part is that you start by randomly varying the boundaries by a lot and then with each successive round you reduce the amount they can be varied.

The idea is that you try to find a ballpark solution with the early rounds and zero in on the local maximums with the later rounds.

I've never found a use it for it, but AIUI it produces results almost as good as brute forcing with orders of magnitude less computation.

Chris S · « **Reply #12 on:** 13 March, 2021, 06:59:42 pm »

This is like advent, in lent!

Feanor · « **Reply #13 on:** 13 March, 2021, 07:01:41 pm »

Depends how scalable it needs to be.
for 350 items into 10 bins, I would not hesitate to brute-force it.

My approach would be similar to that outlined by Diver300.
I'd maintain an array [10,350] to hold the results.
For each Starting Point [0..350], I'd store the resulting 10 bin weights so we can do stats on it at the end.

Get the total weight, divide by 10 to get the ideal bin weight.

Choose a starting point.

Add items sequentially till you go over the nominal bin weight.
Then, we have to make a decision.
Include the item that pushed us over the limit, for an overweight bin, or exclude it for an under-weight bin?
So perhaps go with what gives us the closest.

Move to the next bin, and repeat.

Once you've got all 10 bins, increment the starting point and do it all again.

As Diver33 noted, cumulative errors will all end up in the last bin.
In an 'ideal' dataset, the under/over decisions and magnitudes will average out so the cumulative error added to the final bin should be small.
But these are small datasets, and there's not enough data for things to average out 'long term', so it's likely the last bin will be significantly over/under.

My proposed solution to this would be:
Divide the total by 10, then fill the first bin to this value +/- one item per the initial idea, but then going forward to the second bin...
Subtract all assigned items from the total to get the remainder, then divide that by 9, and fill the next bin to this value etc etc.
Continue to calculate the remainder, then divide it amongst the remaining bins.
This way, we wash out the errors over the course of the calculations.

At the end, once we have iterated over all 350 possible starting points and the array is full, do stats to calculate the mean and SD of the resulting 10 bins, and choose your favourite.

Ben T · « **Reply #14 on:** 23 March, 2021, 12:15:00 pm »

The actual data follows (there are actually 360 items not 350)

The best that my brute force approach has found yet is to make splits before items 81, 112, 183, 204, 239 and 297.
81 <= Item < 112 is the largest group at 31547360 which is only 0.64% above the mean (thus lowest possible value of the heaviest group) of 31345243.3333333.

This is more than good enough for the real purposes - interesting academically as to whether it's possible to prove that there exists no better solution.
That 0.64% sounds low but it is still over 200,000 "weight units"... which a lot of the items are lighter than. So it's not obviously impossible.... but could be.

Code: [Select]

Item      Weight
0	1132
1	1032
2	348
3	1820
4	4056
5	1288
6	1292
7	1424
8	3556
9	3192
10	1008
11	416
12	44
13	896
14	1188
15	664
16	660
17	1584
18	1120
19	1024
20	6744
21	8800
22	15872
23	9988
24	11640
25	2200
26	480
27	1916
28	8364
29	3700
30	56960
31	2396
32	8600
33	1728
34	3368
35	924
36	868
37	688
38	900
39	1524
40	2016
41	1524
42	1512
43	1924
44	5832
45	3788
46	3652
47	4708
48	3044
49	2176
50	2724
51	5704
52	7348
53	10352
54	14404
55	78568
56	331168
57	883472
58	478868
59	312204
60	320944
61	372392
62	504524
63	369844
64	270056
65	292020
66	219480
67	372960
68	497080
69	269288
70	229012
71	260820
72	307960
73	504684
74	511076
75	592580
76	555856
77	522320
78	674164
79	1091756
80	1323440
81	1328924
82	1352300
83	1118820
84	930940
85	811060
86	881984
87	784212
88	625204
89	883012
90	956988
91	963188
92	964892
93	1037880
94	950936
95	1238212
96	1102848
97	1288724
98	1299400
99	1324176
100	1274024
101	1029972
102	1130212
103	1263272
104	1220560
105	1042612
106	1120600
107	879384
108	1112196
109	801556
110	387852
111	441420
112	329004
113	444944
114	415596
115	449988
116	372900
117	204640
118	311444
119	303512
120	184188
121	457460
122	235464
123	204116
124	254184
125	357040
126	315196
127	360700
128	536272
129	439764
130	802852
131	773928
132	768372
133	901272
134	476872
135	560284
136	652340
137	683916
138	543720
139	639232
140	501248
141	524720
142	426368
143	327092
144	492160
145	114032
146	12
147	564
148	1632
149	12
150	12
151	11052
152	6056
153	12
154	13160
155	5724
156	10748
157	8052
158	32800
159	8084
160	5188
161	7580
162	133916
163	280032
164	199568
165	73372
166	153664
167	128616
168	100460
169	122920
170	425508
171	1151856
172	933348
173	791808
174	762816
175	875588
176	1126372
177	1347180
178	1877456
179	1816560
180	1150096
181	1155896
182	1408832
183	1320388
184	1546296
185	1741364
186	1761356
187	2198132
188	2303600
189	2246852
190	2152184
191	1680408
192	1793600
193	1868812
194	1437292
195	1432648
196	1041160
197	820836
198	1045432
199	879084
200	949684
201	979204
202	987016
203	1057920
204	939516
205	1104596
206	1027668
207	1484200
208	1543024
209	1700360
210	2463932
211	2307188
212	1763264
213	1480124
214	1491436
215	1730368
216	1138448
217	1148156
218	864980
219	996964
220	488236
221	393840
222	396656
223	601688
224	816032
225	538976
226	515960
227	483028
228	489772
229	405884
230	402968
231	526296
232	249140
233	155468
234	239948
235	374224
236	211648
237	206400
238	170820
239	182876
240	145760
241	113800
242	64396
243	78316
244	130876
245	135128
246	171940
247	300440
248	228140
249	334160
250	236412
251	278600
252	418648
253	724812
254	780616
255	857924
256	1129380
257	1354996
258	1046576
259	888264
260	914164
261	439672
262	231076
263	268620
264	207468
265	400996
266	245540
267	174964
268	360504
269	214828
270	229152
271	197416
272	125640
273	87768
274	197564
275	331368
276	340552
277	239764
278	470260
279	663372
280	1193908
281	870764
282	838108
283	890148
284	788852
285	1144576
286	1705632
287	1142672
288	914532
289	673800
290	1033688
291	688868
292	860152
293	818508
294	680048
295	720644
296	636904
297	499008
298	409748
299	701120
300	1160232
301	768988
302	321644
303	302440
304	390964
305	380432
306	467196
307	534496
308	276128
309	272528
310	927812
311	553488
312	491736
313	578332
314	452140
315	890760
316	974664
317	757172
318	867588
319	1759988
320	1282620
321	528368
322	204020
323	182888
324	244244
325	235596
326	105868
327	93132
328	65104
329	157176
330	157572
331	180108
332	140672
333	142844
334	1544
335	3412
336	1948
337	1972
338	8136
339	3484
340	3016
341	1744
342	900
343	812
344	4648
345	6684
346	13700
347	8496
348	17160
349	14944
350	20624
351	15084
352	52712
353	28568
354	140728
355	81220
356	52976
357	31528
358	18216
359	7980

Author Topic: Algorithm question - even splitting (Read 1652 times)

Ben T

Algorithm question - even splitting

grams

Re: Algorithm question - even splitting

Davef

Re: Algorithm question - even splitting

Davef

Re: Algorithm question - even splitting

Davef

Re: Algorithm question - even splitting

Diver300

Re: Algorithm question - even splitting

drossall

Re: Algorithm question - even splitting

Lightning Phil

Re: Algorithm question - even splitting

Ben T

Re: Algorithm question - even splitting

Ben T

Re: Algorithm question - even splitting

Lightning Phil

Re: Algorithm question - even splitting

grams

Re: Algorithm question - even splitting

Chris S

Re: Algorithm question - even splitting

Feanor

Re: Algorithm question - even splitting

Ben T

Re: Algorithm question - even splitting