简体   繁体   中英

How to modify the existing table

Using the code below, I extract data from bq table and in that, I have done some data cleaning(updating null values)

query2 ="""SELECT id,movie_name FROM `testing_dataset` where id is null LIMIT 4;"""
exec1 = client.query(query2).to_dataframe()
print(exec1)

  id                     movie_name
0 NaN              captain-fantastic
1 NaN                     passengers
2 NaN                   transformers
3 NaN  guardians-of-the-galaxy-vol.2

movie_buff_uuid= []
for i in df["movie_name"]:
    try:
        url = urlopen(f"https://www.moviebuff.com/{i}.json")
        uuid = json.loads(url.read())['uuid']
        movie_buff_uuid.append(uuid)
    except HTTPError:
        movie_buff_uuid.append(i)
    sleep(5)  


df1 = df.assign(id=movie_buff_uuid)
print(df1)

The final table something look like this

   id                                                      movie_name
0  f8379c86-1307-4b22-b175-5000284ef6b9              captain-fantastic
1  8f0c611a-4356-454d-a6d6-aac437519540                     passengers
2  7cd2dffa-cb31-4897-a7b0-30dcaee66104                   transformers
3  9f1a0e6c-e047-444c-8a8e-e14c1e22bf5c  guardians-of-the-galaxy-vol-2

Using python, I have got the expected output but I do not know how to append or update this in my bigquery table

Note: Here I have taken some sample data what if I have 1000 of rows and how to update values for rows in big query table

modified code

query2 ="""SELECT movie_buff_uuid,movie_name,production_studio,movie_hocode FROM`testing_dataset`where movie_hocode =  'HO00007315';"""
exec1 = client.query(query2).to_dataframe()
exec1['movie_name'] = exec1['movie_name'].str.lower()
print(exec1)


movie_buff_uuid= []

for i in exec1["movie_name"]:
    try:
        url = urlopen(f"https://www.moviebuff.com/{i}.json")
        uuid = json.loads(url.read())['uuid']
        movie_buff_uuid.append(uuid)
    except HTTPError:
        movie_buff_uuid.append(i)
    sleep(5)  


df1 = exec1.assign(movie_buff_uuid=movie_buff_uuid)
print(df1)

client.load_table_from_dataframe(dataframe=df1, destination="pvr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset")

    query3 ="""
UPDATE `ecr-data-project.testing_dataset.movie_metadata_v3a` t
SET t.movie_buff_uuid = u.movie_buff_uuid
FROM `ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset` u
WHERE t.movie_name=u.movie_name
DROP TABLE `ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset`;
;"""

exec2 = client.query(query3).to_dataframe()
print(exec2)

new_query3 ="""
UPDATE ecr-data-project.testing_dataset.movie_metadata_v3a t
SET t.movie_buff_uuid = u.movie_buff_uuid
FROM ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset u
WHERE t.movie_name=u.movie_name;"""

You may update rows in table using UPDATE SQL like this

UPDATE `testing_dataset` SET movie_name = xxx WHERE id = yyy

Check Big Query DML Documenntation document here.

The simplest way to approach this would be to load your dataset into a staging table into BigQuery using the load_table_from_dataframe() method.

example:

client.load_table_from_dataframe(dataframe=df1, destination='project_id.dataset_id.updated_dataset')

The result of the above will create a table in BigQuery with your the data from your dataframe which contains the updated ID values.

Once within BigQuery you can issue a update statement similar to this:

UPDATE project_id.dataset_id.testing_dataset t
SET t.id = u.id
FROM project_id.dataset_id.updated_dataset u
WHERE t.movie_name=u.movie_name

This will replace the IDs in your testing_dataset with all matches from staging table you updated the IDs on in python.

Once the data is updated you can drop the updated_dataset table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM