Using the code below, I extract data from bq table and in that, I have done some data cleaning(updating null values)
query2 ="""SELECT id,movie_name FROM `testing_dataset` where id is null LIMIT 4;"""
exec1 = client.query(query2).to_dataframe()
print(exec1)
id movie_name
0 NaN captain-fantastic
1 NaN passengers
2 NaN transformers
3 NaN guardians-of-the-galaxy-vol.2
movie_buff_uuid= []
for i in df["movie_name"]:
try:
url = urlopen(f"https://www.moviebuff.com/{i}.json")
uuid = json.loads(url.read())['uuid']
movie_buff_uuid.append(uuid)
except HTTPError:
movie_buff_uuid.append(i)
sleep(5)
df1 = df.assign(id=movie_buff_uuid)
print(df1)
The final table something look like this
id movie_name
0 f8379c86-1307-4b22-b175-5000284ef6b9 captain-fantastic
1 8f0c611a-4356-454d-a6d6-aac437519540 passengers
2 7cd2dffa-cb31-4897-a7b0-30dcaee66104 transformers
3 9f1a0e6c-e047-444c-8a8e-e14c1e22bf5c guardians-of-the-galaxy-vol-2
Using python, I have got the expected output but I do not know how to append or update this in my bigquery table
Note: Here I have taken some sample data what if I have 1000 of rows and how to update values for rows in big query table
modified code
query2 ="""SELECT movie_buff_uuid,movie_name,production_studio,movie_hocode FROM`testing_dataset`where movie_hocode = 'HO00007315';"""
exec1 = client.query(query2).to_dataframe()
exec1['movie_name'] = exec1['movie_name'].str.lower()
print(exec1)
movie_buff_uuid= []
for i in exec1["movie_name"]:
try:
url = urlopen(f"https://www.moviebuff.com/{i}.json")
uuid = json.loads(url.read())['uuid']
movie_buff_uuid.append(uuid)
except HTTPError:
movie_buff_uuid.append(i)
sleep(5)
df1 = exec1.assign(movie_buff_uuid=movie_buff_uuid)
print(df1)
client.load_table_from_dataframe(dataframe=df1, destination="pvr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset")
query3 ="""
UPDATE `ecr-data-project.testing_dataset.movie_metadata_v3a` t
SET t.movie_buff_uuid = u.movie_buff_uuid
FROM `ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset` u
WHERE t.movie_name=u.movie_name
DROP TABLE `ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset`;
;"""
exec2 = client.query(query3).to_dataframe()
print(exec2)
new_query3 ="""
UPDATE ecr-data-project.testing_dataset.movie_metadata_v3a t
SET t.movie_buff_uuid = u.movie_buff_uuid
FROM ecr-data-project.testing_dataset.movie_metadata_v3a.updated_dataset u
WHERE t.movie_name=u.movie_name;"""
You may update rows in table using UPDATE SQL like this
UPDATE `testing_dataset` SET movie_name = xxx WHERE id = yyy
Check Big Query DML Documenntation document here.
The simplest way to approach this would be to load your dataset into a staging table into BigQuery using the load_table_from_dataframe()
method.
example:
client.load_table_from_dataframe(dataframe=df1, destination='project_id.dataset_id.updated_dataset')
The result of the above will create a table in BigQuery with your the data from your dataframe which contains the updated ID values.
Once within BigQuery you can issue a update statement similar to this:
UPDATE project_id.dataset_id.testing_dataset t
SET t.id = u.id
FROM project_id.dataset_id.updated_dataset u
WHERE t.movie_name=u.movie_name
This will replace the IDs in your testing_dataset with all matches from staging table you updated the IDs on in python.
Once the data is updated you can drop the updated_dataset
table.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.