简体   繁体   中英

Insert Pandas dataframe into Cassandra Table

From the documentation , there is a way to insert data into table:

session.execute(
    """
    INSERT INTO users (name, credits, user_id)
    VALUES (%s, %s, %s)
    """,
    ("John O'Reilly", 42, uuid.uuid1())
)

The column name have to be stated there. However, in my case, I have a dataframe which has only a header row and a row of data, for example: "sepal_length" : 5.1,"sepal_width" : 3.5,"petal_length" : 1.4 ,"petal_width" : 0.2, "species" : "Iris" .

The user will provide the information for my API to connect to their particular Cassandra database's table which contain the columns name that stored in the dataframe. How can I insert the dataframe's data with respect to the column header mapped to the table without actually hardcode the column name like stated in the documentation since the headers are not the same for different cases.

I am trying to achieve something like this:

def insert_table(df, table_name, ... #connection details):
    #Set up connection and session
    session.execute(
        """
        INSERT INTO table_name(#df's column header)
        VALUES (%s, %s, %s)
        """,
        (#df's data for the only row)
    ) 

I discovered this but I actually just need a simple insert operation.

You can get the Dataframe's column names with the following

column_names = list(my_dataframe.columns.values)

You could rewrite insert_table(...) to accept the list of column names as an argument.

For example, string substitution can be used to form the CQL statement:

cql_query = """
    INSERT INTO {table_name} ({col_names})
    VALUES (%s, %s, %s)
    """.format(table_name="my_table", col_names=','.join(map(str, column_names)))
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM