Tag[data-pipeline] Recent Newest Questions

How do I trigger Apache Beam side inputs periodically?

I have a Dataflow Pipeline with streaming data, and I am using an Apache Beam Side Input of a bounded data source, which may have updates. How do I tr ...

ValueError when running Python function in data pipeline

I'm building a data pipeline using Python and I'm running into an issue when trying to execute a certain function. The error message I'm receiving is: ...

Design ideas for importing multiple customer csv files into a transactional database

I am working on redesigning a data pipeline that is responsible for importing customer data in CSV format from cloud buckets that customers own(We hav ...

Dagster sensor to check for new records in a table

I have 2 tables where 2nd is dependent on 1st. Whenever new records are added in 1st, I want to run a dagster job. I came across sensors but I am not ...

Is there a configuration in Kafka to write multiple records to one S3 object?

I'm using an S3 Sink Connector to write records to S3 from Kafka. Eventually I will be using Kafka to capture CDC Packets from my Database and then wr ...

Data pipeline - Best approach to read data from network drive

Source: CSV files located in a shared drive(on Prem server). Access to this shared drive and folder is controlled using a security group. Expectation ...

ADF - how to compare two Azure SQL Database tables (A and B) with the same structure and to insert only the missing values from table A to table B

I want to create an ADF data pipeline that compares both tables and after the comparison to add the missing rows from table A to table B table A - 10 ...

"Error: Forbidden" even though service account has function permission access

I am trying to deploy a data ingestion pipeline in Google Cloud Functions. When I trigger the URL, I get the following error: Error: Forbidden Yo ...

How can we generate multiple output file in benthos?

Input Data: extension.yaml input: label: "" file: paths: [./input/*] codec: lines max_buffer: 1000000 delete_on_finish: false ...

Is there any way where we can generate output file based on the input data in benthos?

For example: Input Data: {"date":"03-11-22", "message":"This is message"}, {"date":"03-30-22", "message":"This is message"}, {"date":"04-03-22", "mes ...

Cannot run AWS Data Pipeline job due to ListObjectsV2 operation: Access Denied

I've written some CDK code to programmatically create a data pipeline that backs up a DynamoDB table into an S3 bucket on a daily basis. But it keeps ...

Azure ADF Data Pipeline- Multiple Activities to Single Activity Execution

Could someone please help with following scenario ? So this Data Pipeline has multiple activities (Set Variable) targeting to Single activity Send Em ...

What Python libraries to use in Data Pipeline?

I have a .csv in PowerBI and I need to automate a process to do daily uploads to BigQuery. First of all, what python libraries should I keep in mind t ...

Handling multiple inputs for command in Snakemake

I'm currently working on a project that involves me using snakemake to run svaba, a variant caller, on genome data. svaba run can take multiple sample ...

İs it possible to create data pipeline with Google Cloud Data fusion using multiple database tables Update or Upsert?

After using multiple database tables plugin and load data to bigquery I would like to make incremental load for every table in one data pipeline. I w ...

How to extract google analytics dimensions and metrics with python?

I was going to extract data from google analytics with Python and google analytics report api v4, in the middle I need the Google analytics Dimensions ...

I can't log in to Apache nifi 1.16.3

I would like to create data pipline with Apach Nifi (for learning purpose) but After installed jdk-17.0.3.1_windows-x64_bin and downloaded Nifi 1.16.3 ...

Data Pipeline using Azure Data Factory or Azure Synapse

I am building a new data pipeline for our team. This data pipeline would collect data from multiple sources and ingest them into a single table. I am ...

Camunda as scheduler and orchestrator of data-pipeline / ETL

I would like to know if anyone implemented Camunda as scheduler and orchestrator of data pipelines/ETL and can share his experience. What are the pro ...

Why PyTorch creates another data repro TorchData

Why PyTorch creates another repro called TorchData for similar/new Dataset and DataLoader instead of adding them in the existing PyTorch repro? What's ...