简体   繁体   中英

Pandas iterate through columns in dataframe for custom MySQL insert string

I am trying to combine the values of different columns in a single data frame's row into a string, separated by a comma, so that I can create a custom SQL insert string to be executed on a MySQL database. I have 67 different columns, and I am trying to prevent writing code that addresses each column's name individually, mainly to maximize reusability of the code for different size dataframes. I could potentially have anywhere from 1 to 2000 rows to iterate through, with each row having an INSERT query.

For example, if my DataFrame includes the following:

RecDate       WindDir       WindSpeed       OutdoorTemperature       OutdoorHumidity
20160321      121           3               67.5                     43.8
20160322      87            5               73.1                     53.2
20160323      90            2               71.1                     51.7
20160324      103           7               68.3                     47.0

I am wanting to create a string for each row in the dataframe: INSERT INTO tablename VALUES (20160321, 121, 3, 67.5, 43.8) INSERT INTO tablename VALUES (20160322, 87, 5, 73.1, 53.2) INSERT INTO tablename VALUES (20160323, 90, 2, 71.1, 51.7) INSERT INTO tablename VALUES (20160324, 103, 7, 68.3, 47.0)

I have considered using the dataframe's to_sql() function, but was not able to get the code to work with my database structure.

So, my goal was to iterrate through each row, and manually creating the string in the parentheses, separated by a comma:

for index, row in df.iterrows():
   print('INSERT INTO tablename VALUES (%s, %s, %s, %s, %s)' % (row['RecDate'], row['WindDir'], row['WindSpeed'], row['OutdoorTemperature'], row['OutdoorHumidity']))

To make my code "pythonic" and not as rigid, I tried to iterrate through each row, adding a comma between each column index:

for index, row in df.iterrows():
    string = ''

    for x in range(len(row)):
        string += '%s, ' % row[x]

    print('INSERT INTO tablename VALUES (%s)' % string)

I am routinely getting index errors and out of bound errors with the above code, and am not really sure what the correct route to go is. I'd appreciate an inspection of my code and thought process, and any recommendations on how I can improve the code. My goal is to be as efficient as possible, minimize the amount of code I have to write (especially when there's 67 columns!), but still make the code flexible for various uses, especially if the number of columns were to ever change.

Thank you!

please try below code

def cq_processor(x):
    return 'INSERT INTO tablename VALUES ({})'.format(', '.join(x.tolist()))

df.apply(cq_processor, axis=1)

You are getting errors because rows does not support numeric indexing.

In other words, calling rows[1] is not correct. You must call rows['column-name'] instead.

iterrows() does not return a traditional list - it returns a generator of an integer and a Series object. From the source , the function is defined as follows:

columns = self.columns
for k, v in zip(self.index, self.values):
    s = Series(v, index=columns, name=k)
    yield k, s

If you know your pandas , you'll see that the index=columns bit tells the series to accept only the column names as valid indices. When this argument is not specified, then and only then does Series default to allowing integer-based indexing.

tl;dr Do your first approach. It is the correct way to index in this particular Series object. Consider using .format() instead to really make it more Pythonic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM