简体   繁体   中英

Storing files for testbin/pastebin in Python

I'm basically trying to setup my own private pastebin where I can save html files on my private server to test and fool around - have some sort of textarea for the initial input, save the file, and after saving I'd like to be able to view all the files I saved.

I'm trying to write this in python, just wondering what the most practical way would be of storing the file(s) or the code? SQLite? Straight up flat files?

One other thing I'm worried about is the uniqueness of the files, obviously I don't want conflicting filenames ( maybe save using 'title' and timestamp? ) - how should I structure it?

I wrote something similar a while back in Django to test jQuery snippets. See:

http://jquery.nodnod.net/

I have the code available on GitHub at http://github.com/dz/jquerytester/tree/master if you're curious.

If you're using straight Python, there are a couple ways to approach naming:

  1. If storing as files, ask for a name, salt with current time, and generate a hash for the filename.

  2. If using mysqlite or some other database, just use a numerical unique ID.

Personally, I'd go for #2. It's easy, ensures uniqueness, and allows you to easily fetch various sets of 'files'.

Have you considered trying lodgeit . Its a free pastbin which you can host yourself. I do not know how hard it is to set up.

Looking at their code they have gone with a database for storage (sqllite will do). They have structured there paste table like, (this is sqlalchemy table declaration style). The code is just a text field.

pastes = Table('pastes', metadata,
        Column('paste_id', Integer, primary_key=True),
        Column('code', Text),
        Column('parent_id', Integer, ForeignKey('pastes.paste_id'),
               nullable=True),
        Column('pub_date', DateTime),
        Column('language', String(30)),
        Column('user_hash', String(40), nullable=True),
        Column('handled', Boolean, nullable=False),
        Column('private_id', String(40), unique=True, nullable=True)
    )

They have also made a hierarchy (see the self join) which is used for versioning.

Plain files are definitely more effective. Save your database for more complex queries.

If you need some formatting to be done on files, such as highlighting the code properly, it is better to do it before you save the file with that code. That way you don't need to apply formatting every time the file is shown.

You definitely would need somehow ensure all file names are unique, but this task is trivial, since you can just check, if the file already exists on the disk and if it does, add some number to its name and check again and so on.

Don't store them all in one directory either, since filesystem can perform much worse if there are A LOT (~ 1 million) files in the single directory, so you can structure your storage like this:

FILE_DIR/YEAR/MONTH/FileID.html and store the "YEAR/MONTH/FileID" Part in the database as a unique ID for the file.

Of course, if you don't worry about performance (not many users, for example) you can just go with storing everything in the database, which is much easier to manage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM