简体   繁体   中英

How can I diff and patch/merge strings instead of files?

I'm working on a project where people are able to submit stories and have other people contribute. Rather than simply editing an entry in the database, I would like to store the changes people make rather than the entire new set of changes. Then I can dynamically apply diffs if people want to revert to a previous version. I can also easily present users that are Editors with only the modified text so that they can jump right to the changes.

I am aware of how to take diff files and patch other files with them. But I'm making a web app with Python and Django, and I'll be storing all of these diffs in a MySQL database. Given that performance isn't a major issue for this app, I am prepared to pull the data from the DB, make files, and run git diff and patch on those files.

Is there a better way than building new files and deleting them every time I want to create a new version or apply a new diff? Is there some way to run diffs on straight text instead of files? Eg. setting variables in bash to be the contents of (what would be) a file (but is actually data from the DB), and running git diff on them? I would like to be controlling these actions from a Python file after the user submits a form.

I'm really just looking for a good way to get started on this problem, so any help would be greatly appreciated.

Thanks for your time,

ParagonRG

I have done quite a bit of searching for a solution for this. Python's difflib is fairly legit, but unfortunately it tends to require that the diff strings contain the entire original strings with records of what was changed. This differs from, say, a git diff, where you only see what was changed and some extra context. difflib also provides a function called unified_diff which does indeed provide a shorter diff, but it doesn't provide a function for rebuilding a string from a string and a diff. Eg. if I made a diff out of text1 and text2, called diff1, then I couldn't generate text2 out of text1 and diff1.

I have therefore made a simple Python module that allows for strings to be rebuilt, both forwards and backwards, from a single string and its related diffs. It's called merge_in_memory, and can be found at https://github.com/danielmoniz/merge_in_memory . Simply pull the repository and run the setup.py.

A simple example of its usage:

import merge_in_memory as mim_module

str1 = """line 1
line 2"""
str2 = """line 1
line 2 changed"""

merger = mim_module.Merger()
print merger.diff_make(str1, str2)

This will output:

--- 
+++ 
@@ -1,2 +1,2 @@
 line 1
-line 2
+line 2 changed

diffs are simply strings (rather tan generators, as they are when using difflib).You can create a number of diffs and apply them at once (ie. fast-forward through a history or track back) with the diff_apply_bulk() function.

To reverse into the history, simply ensure that the reverse attribute is set to True when calling either diff_bulk() or diff_apply_bulk . For example:

merge = self.inline_merge.diff_apply_bulk(text3, [diff1, diff2], reverse=True)

If you start with text1 and generated text2 and text3 with diff1 and diff2, then text1 is rebuilt with the above line of code. Note that the list of diffs are still in ascending order. A 'merge', ie. applying a diff to a string, is itself a string.

All of this allows me to store diffs in the database as simple VARCHARs (or what-have-you). I can pull them out in order and apply them in either direction to generate the text I want, as long as I have a starting point.

Please feel free to leave any comments about this, as it is my first Python module.

Thanks,

ParagonRG

Have a look at libgit . It is a C (and every other language) interface that lets you manipulate a git repository in various ways.

It seems pretty low-level so getting it to actually commit, diff and so on might be tedious, but it does at least have a function to add a blob to the repo without it needing to be on disk .

The alternative of course is to create a normal file-based repository and working copy and bounce stuff back and forth between the database and file system using os.system calls.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM