简体   繁体   中英

Insert Foreign Key with two tables containing the same columns in SQLite with Python

I have a SQLite database on my computer with two tables:

  • table_post (contain information about a twitter post: likes, author, URL, etc...)
  • table_profile (contain information about a Twitter profile: username, description, followers, etc...)

I am using a python script which create two separate CSV files containing the rows of table_post and table_profile.

Then I use another script to transfer the CSV files to the SQLite database. Everything works fine until I want to link the two tables with a foreign key.

My table_post has these columns: post_ID(PK), profile_ID(FK), postUrl, postText, pubDate, commentCount, likeCount, profileUrl

My table profile has these columns: profile_ID(PK), profileUrl, subCount, userName, profileDesc

Both tables have profileUrl, I would like to insert into table_post.profile_ID the corresponding table_profile.profile_ID using the profileUrl column.

I know that we can use this SQLite query to join the rows:

 SELECT * FROM table_profile JOIN table_post ON table_profile.profileUrl = table_post.profileUrl;

I would like to insert the corresponding profile_ID in post_ID using Python and sqlite3. What could I do? Do I need to write the ID while writing in the SQLite database? If yes, how?

Can I write a function that check every row in the post_table and associate it with a profile_ID? If yes, how?

Thanks.

What you need is an UPDATE statement and not INSERT .

You must update the table table_post after you transfer the CSV files to the SQLite database:

UPDATE table_post
SET profile_ID = (SELECT profile.profile_ID FROM profile WHERE profile.profileUrl = table_post.profileUrl)

If your version of SQLite is 3.33.0+ you can use the UPDATE...FROM syntax:

UPDATE table_post AS t
SET profile_ID = p.profile_ID 
FROM profile AS p
WHERE p.profileUrl = t.profileUrl

If you have loaded both tables from the CSV's and if you are not using or expecting to use foreign key constraints then you can simply run an UPDATE eg

UPDATE table_post 
    SET profile_ID = (SELECT table_profile.profile 
    FROM table_profile 
    WHERE table_profile.profileurl = table_post.profileurl)
;

However, IF you want to use a Foreign Key constraint to enforce referential integrity, AND/OR if you want to normalise the profileurl (reduce the duplication of the data) then an alternative approach would be to

  1. read the profile CSV file and insert/load the profile's and,
  2. then read the post's CSV file and insert/load the posts using a subquery to resolve the post's profile_ID during the INSERT.
  • The UPDATE after the inserts/loads would fail with Foreign Key constraint exceptions unless foreign key handling was turned off (see PRAGMA Foreign Keys )

    • You may have to turn Foreign Key handling on as the default is that it is turned of.
  • Using a Foreign Key constraint not only enforces referential integrity but can CASCADE updates and deletions from the Profile to the posts. Eg if you were to delete a profile then all of the posts related to that profile would be deleted rather than a foreign key exception.

    • without using a Foreign Key constraint deleting a profile could orphan posts (leave them without a profile to relate to).
  • the crux of first proposed method is setting the table_post profile_id column using the subquery (SELECT profile_id FROM table_profile WHERE profileurl = 'url2') . This will increase the time taken (hence reduced if the UNIQUE constraint is used, but then at the expense of inserts into the profile table taking a little longer, seeing they are the parents inserting profiles would be less of a factor).

  • about normalising; in your current model you have the profileurl stored in both the parent (profiles) and the children(posts) but appear to not want to use that for the relationship and as such storing the profileurl in the post table is not needed. The superfluous storage space can be freed. Furthermore you don't have to maintain the duplicated occurrences. Say you have profile X with a url of x_is_here and for some reason that url needed to be changed to x is not here you would have to change all posts that are related to X (if not normalised). However if the url were only stored in the profile then it would only need to be changed once.

Another approach could be to utilise the already existing relationship that is use profileurl. However, again if you want to enforce referential integrity then the table_profile and table_post would need to be modified a little, the insert/load as is is then fine.

Example

This is an example based upon the schema you have described but utilising Foreign Key constraints. Additionally an alternative that utilises the profileurl for the relationship.

DROP TABLE IF EXISTS table_post;
DROP TABLE IF EXISTS table_post_alt;
DROP TABLE IF EXISTS table_profile;
CREATE TABLE IF NOT EXISTS table_profile (
    profile_id INTEGER PRIMARY KEY, 
    profileurl TEXT UNIQUE, 
    subCount INTEGER, 
    userName, 
    profileDesc TEXT
);
CREATE TABLE IF NOT EXISTS table_post (
    post_id INTEGER PRIMARY KEY, 
    profile_id INTEGER REFERENCES table_profile(profile_id), 
    post_url TEXT, 
    postText TEXT, 
    pubDate TEXT, 
    commentCount INTEGER, 
    likeCount INTEGER, 
    profileurl
);
CREATE TABLE IF NOT EXISTS table_post_alt (
    post_id INTEGER PRIMARY KEY, 
    profile_id INTEGER, 
    post_url TEXT, 
    postText TEXT, 
    pubDate TEXT, 
    commentCount INTEGER, likeCount INTEGER, profileurl REFERENCES table_profile(profileurl));

INSERT OR IGNORE INTO table_profile VALUES
    (null,'url1',10,'user1','blah1')
    ,(null,'url2',100,'user2','blah2')
    ,(null,'url3',10,'user3','blah3')
    /* etc....  */
    ,(null,'url2',100,'user2','blah2') /* purposeful duplicate (ignored) */
;
INSERT OR IGNORE INTO table_post VALUES 
    (null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url2'),'post_url1','post text 1st post','2020-04-01',5,7,'url2')
    ,(null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url1'),'post_url2','post text 2nd post','2020-04-01',5,7,'url1')
    ,(null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url1'),'post_url3','post text 3rd post','2020-04-01',5,7,'url1')
    ,(null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url3'),'post_url4','post text 4th post','2020-04-01',5,7,'url3')
    ,(null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url2'),'post_url5','post text 5th post','2020-04-01',5,7,'url2')
    ,(null,(SELECT profile_id FROM table_profile WHERE profileurl = 'url1'),'post_url6','post text 6th post','2020-04-01',5,7,'url1')
    /* etc .... */
;
INSERT OR IGNORE INTO table_post_alt VALUES
    (null,'does not matter','post_url1','post text 1st post','2020-04-01',5,7,'url2')
    ,(null,'does not matter','post_url2','post text 2nd post','2020-04-01',5,7,'url1')
    ,(null,'does not matter','post_url3','post text 3rd post','2020-04-01',5,7,'url1')
    ,(null,'does not matter','post_url4','post text 4th post','2020-04-01',5,7,'url3')
    ,(null,'does not matter','post_url5','post text 5th post','2020-04-01',5,7,'url2')
    ,(null,'does not matter','post_url6','post text 6th post','2020-04-01',5,7,'url1')
;

SELECT * FROM table_profile
JOIN table_post ON table_profile.profileUrl = table_post.profileUrl; 
SELECT * FROM table_profile
JOIN table_post_alt ON table_profile.profileUrl = table_post_alt.profileUrl; 

Running the above results in (using your query):-

在此处输入图像描述

And (using your query modified just for the alternative table name):-

在此处输入图像描述

  • note value does not matter has been used to show that as the FK is the profileurl it doesn't matter that the profile_Id doesn't match (you could use the same method as the first option to get the correct value BUT then data isn't fully normalised, it isn't in the original anyway as your have two occurrences of profileurl) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM