I’m teaching AH for the first time this year and am using the Robert Gordon materials that my school has. I’m a bit stumped with the first program which claims to be revision of H material but I’m not sure how a function to locate and delete duplicate records in an array of records can be made using the Higher concepts. Does anyone have a solution for this program in Python they’d maybe be willing to share? Although there are programs for the AH stuff included in the pack I cannot see a solution for the first program.
————————————————————–
A department in a local secondary school kept a record of pupils email address, names and prelim marks (expressed as a percentage). They were kept in a file such as below:
tmadre12@inverglen.sch.uk,Thain,Madre,0.36
There was an error and some records were duplicated.
You are to write a program that will:
– Read the file rawpupildata.csv into an array of records
– Display the amount of records in the original file
– Create a new file with no duplicate records
– Display the amount of duplicate records.
– Display the minimum and maximum prelim percentage from the new file
Your output should look similar to below:
60 records in the original file
New File created with 48 records
12 duplicates removed
Minimum Prelim Percentage: 18%
Maximum Prelim Percentage 99%
I don’t know of the materials, but off the top of my head it sounds well beyond the higher course.
It would depend a bit on how records were implemented; I teach my pupils to use a class as a record. But even if people had done a list of lists, or a list of tuples, it would be just as difficult, if not more.
The easiest way to remove duplicates from a list in python is to create a set based on the list, then convert the set back into a list. Sets don’t allow duplicates so just remove any. The problem is that lists aren’t hashable (so list of lists won’t work), and objects won’t be equal as default just because the properties are all equal – the programmer would need to add __eq__ and __hash__ methods. I’d say this is way beyond what is required at higher.
If I were to expect higher pupils to do it, I’d probably expect something like this (but this only compares email addresses, not the whole object):
def removeDuplicates(listOfObjects):
new_data = []
for i in range(len(listOfObjects)):
unique = True
for j in range(i + 1, len(listOfObjects)):
if listOfObjects[i].email == listOfObjects[j].email:
unique = False
if unique:
new_data.append(listOfObjects[i])
return new_data
Here’s a link to a working example:
https://repl.it/@LeeMurray/Python-3-15
Even then, I think it’s way beyond what I’d expect of Higher pupils.
That’s what I though as well Lee, I’d given one pupils this program to try without really reading it properly. I assumed that it would only require the standard algorithms plus the use of a class to form a record.
Could also be done using pop() in Python to remove duplicates from the list without creating another. (Below would have to be adapted for array of records. I used @dataclass for arrays of records btw.)
def removeDuplicates(listnums):
counter = 0
while counter < len(listnums):
counter2 = counter+1
while counter2 < len(listnums):
if listnums[counter] == listnums[counter2]:
listnums.pop(counter2)
else:
counter2 += 1
counter += 1
return listnums
listnums = [1,2,1,7,3,4,2,3,4,2,3,5,6,7,5,4,5,5,5,3,2,3,4,5,1]
listnums = removeDuplicates(listnums)
print (listnums)
For higher I’d think the easiest solution to explain would be:
– Traverse the original array of records copying occurrences into a new array of records.
– Before each copy, traverse the new list to see if the data being copied exists already in the second list.
Not too efficient but easy to explain.
Nice discussion in there for AH to then look at other solutions like Lee’s.
So much for my indentation when I posted!!! 🙂 Happy to send the file to anyone.
greg.reid@sqa.org.uk
It’s a good question for discussion in class. As there is no use made of the duplicate data you could read each line from the file and only add it to a list if it wasn’t already present in the list. This compares each line in full and also lets you find the original and new count of records at the same time.
file = open("rawpupildata.txt", "r")
all_data = []
originalCount = 0
newCount = 0
# Read lines into list without duplicates
for line in file:
originalCount+=1
if not(line in all_data):
all_data.append(line)
newCount+=1
Then unpack each list element into a record structure, determine the minimum and maximum marks then save the list of records to a new file.
Definitely a good one for discussion and seeing the inventiveness and resourcefulness of the pupils. I wouldn’t call it revision of the Higher since I don’t teach to that depth. Checking the higher specification, nested loops aren’t mentioned (though nested ifs are).
If they’re in your advanced higher class, I’d expect them to be able to come up with a solution, but I’d call it more of a challenge than revision.
Fixed the indentation Greg. Just needed to highlight with the code tag.
You must be logged in to reply to this topic.