Python script to list duplicated notes
-
Does anyone got any script to detect duplicated notes?...[Check last post]
After problems with sync that caused uncontrolled cloning and flaws in merge strategy I got total mess...
I'm just already tired of cleaning it again and again manually...Trying to sync 5 systems only makes things worse...
-
@rotfl said in Looking for any kind of script to find duplicated notes:
Trying to sync 5 systems only makes things worse...
If the systems are in sync, shouldn't you just be able to clean up one system and have the changes propagate to the other systems?
Disclaimer: I haven't experienced this bug
-
@lonm I don't exactly understand sync merge strategy...
And the problem is that I dont use all of these systems simultaneously (example: 2 OS on same PC).I'm getting more and more overwhelmed with duplicates / clones / different versions of same notes...
I really don't get why. From developer point of view each entity should have some unique ID to prevent such a behaviour.
Nevermind.... I understand that sync is currently in "WIP state".I just want to clean up a bit because I don't want to get lost in trash
-
It might be a good starting point to inspect vivaldi.notes in the browser console. That might demonstrate if there is a conflict with IDs that is upsetting sync.
You can see everything with the command
vivaldi.notes.getTree(console.log)
-
The search field in the Notes Panel could help you find duplicates quickly as long as you don't have hundreds of notes.
-
@lonm How to open that console?
I used a simple dumb Python comparison script like
import json notes_path = "C:/Users/xxxxxxx/AppData/Local/Vivaldi/User Data/Default/Notes" def are_the_same(note1, note2): print("Comapring notes " + note1['id'] + " & " + note2['id']) if note1['content'] == note2['content'] and \ note1['subject'] == note2['subject'] and \ note1['type'] == note2['type'] and \ note1['url'] == note2['url']: if ('attachments' in note1) != ('attachments' in note2): print('Difference with attachments'); if note1['date_added'] != note2['date_added']: print('Difference with date_added'); return True else: return False flattened_notes = {} def walk(node, path): for item in node['children']: if item['type'] == 'note': process_note(item, path) elif item['type'] == 'folder' or item['type'] == 'other' or item['type'] == 'trash': walk(item, path + item['id'] + '\\') def process_note(note, path): flattened_notes[note['id']] = note # print(path + note['id']) # if not 'attachments' in note: # print("no attachments") def compare_notes_by_id(note1_id, note2_id): result = are_the_same(flattened_notes[note1_id], flattened_notes[note2_id]) if result: print("+") else: print("Notes are different") with open(notes_path, mode='r', encoding="utf8") as file: notes = json.load(file) walk(notes, '') # print('Loaded notes: ' + len(flattened_notes.keys())) compare_notes_by_id('123', '456')
because I had no idea of possible consequences of deleting them directly via file...
@Pesala It's especially hard task with many really long notes...
-
@rotfl That script you have there could work, as long as you back up your notes file I don't think there's any harm that could be done.
HOWEVER. If you edit the notes file while sync is still active, the next time you turn sync on / vivaldi starts, it's just going to download all the notes which it thinks are missing.
If you have everything else (like bookmarks) synced to your current profile, the best bet would be to force the server to clear the data it sores about you, then edit the notes to remove dupes, then turn sync back on.
Not that will end up duplicating the notes that are stored on any other machines, so it might be best to refresh to profiles on the other machines (keeping the existing ones as backups), and then downloading all the new data through sync once the notes are fixed.
-
@lonm said in Looking for any kind of script to find duplicated notes:
Not that will end up duplicating the notes that are stored on any other machines, so it might be best to refresh to profiles on the other machines (keeping the existing ones as backups), and then downloading all the new data through sync once the notes are fixed.
It's not the best scenario
I did this once with different group of machines and in the result all of the notes had been cloned xD -
@rotfl I don't understand how the notes could be cloned if the other profiles were empty to begin with. If that was the case, then the notes must be getting cloned on one profile only.
Have you ever tried clicking this button (with working profiles backed up):
Before refreshing profiles?
-
@lonm Yup.
Main problem is that Sync does not sync attachements.
So I could not start with totally empty profile.I just wiped server data , cloned
Notes
file between computers & tried to start sync from scratch.
I have no idea how notes are matched but it's not just by their ID / content...First synchronisation cloned all of existing notes / folders on both computers
Partial notes sync is great idea for bigger mess...
-
Final version
This script lists all doubled notes (with 20 chars of subject, 20 chars of content and IDs) grouped by content
It ignores creation date / attachments while compring notes
import json notes_path = "C:/Users/xxxxxxx/AppData/Local/Vivaldi/User Data/Default/Notes" class Note: def __init__(self, id, subject, content, type, url, attachments): self.id = id self.subject = subject self.content = content self.type = type self.url = url self.attachments = attachments def __hash__(self): return hash((self.subject, self.content, self.type, self.url, self.attachments)) def __eq__(self, other): return self.__class__ == other.__class__ and (self.subject, self.content, self.type, self.url, self.attachments) == (other.subject, other.content, other.type, other.url, other.attachments) def __ne__(self, other): return not(self == other) flattened_notes = {} def load(node, path): for item in node['children']: if item['type'] == 'note': process_note(item, path) elif item['type'] == 'folder' or item['type'] == 'other' or item['type'] == 'trash': load(item, path + item['id'] + '\\') def process_note(note, path): flattened_notes[note['id']] = Note(note['id'], note['subject'], note['content'], note['type'], note['url'], 0) # 0 => Ignoring attachments! with open(notes_path, mode='r', encoding="utf8") as file: notes = json.load(file) load(notes, '') print('Loaded notes: ' + str(len(flattened_notes.keys()))) flipped_multidict = {} for key, value in flattened_notes.items(): flipped_multidict.setdefault(value, set()).add(key) doubles = [(key.subject[0:20], key.content[0:20], values) for key, values in flipped_multidict.items() if len(values) > 1] for item in doubles: print(item)
@Pesala @Ayespy @Gwen-Dragon I am not sure if this is the right category for this...
-