简体   繁体   English

如何使用python熊猫逐行更新数据框

[英]How to update row by row of dataframe using python pandas

I don't know whether it can be achieved or not using python pandas. 我不知道是否可以使用python pandas实现。 Here is the scenario I'm trying to do 这是我要尝试的情况

I created a databases connection to MSSQL using python (pyodbc, sqlalchemy) 我使用python(pyodbc,sqlalchemy)创建了到MSSQL的数据库连接

I read one table and saved it as dataframe like this 我读了一张表,并将其保存为这样的数据框

data = pd.read_sql_table('ENCOUNTERP1', conn)

and the dataframe looks like this 数据框看起来像这样

ENCOUNTERID DIAGCODE DIAGSEQNO POA DIAGVERFLAG
0        78841   3GRNFC         3   P
1        89960                  6
2        86479  N18BZON         9   K
3        69135    MPPY3         9   9           0
4        32422   DS6SBT         2               P
5        69135                  4   D           H
6        92019      PP0         1
7        42105                  2               L
8        99256        U         1               J
9        33940  II9ZODF         3   2
10       33940       OH         1
11       65108   CI6COE         8   U
12       77871   Y3ZHN1         7               S
13       65108  73BJBZV         8   7
14       99256        7         1               T

Now I have one more dataframe ( dp = pd.read_sql_table('tblDiagnosis', conn) )which has DIAGCODE column in it and they all are unique 现在,我还有一个数据dp = pd.read_sql_table('tblDiagnosis', conn)dp = pd.read_sql_table('tblDiagnosis', conn) ),其中包含DIAGCODE列,并且它们都是唯一的

I want to get those DIAGCODE from dataframe dp and update it to dataframe data['DIAGCODE'] 我想从数据帧dp获取那些DIAGCODE并将其更新为数据帧data['DIAGCODE']

I tried to do like this iterate over each row and update another dataframe row by row but here in this code the second for loop will start from 0 index every time so, finally the entire row is filled with one value. 我试图这样做,这样遍历每一行并逐行更新另一个数据帧,但是在此代码中,第二个for循环每次都会从0索引开始,因此,最后整个行都填充有一个值。

for index, row in dp.iterrows(): 
        for i, r in data.iterrows():
            r['DIAGCODE'] = row['Code']

First of all the two dataframe's are not equal in size this is dataframe for data 首先两个数据帧的大小不相等,这是数据的data

Code Description Category IcdSet 0 001 001 - CHOLERA CHOLERA 9 1 0010 0010 - CHOLERA D/T V. CHOLERAE CHOLERA 9 2 0011 0011 - CHOLERA D/T V. EL TOR CHOLERA 9 3 0019 0019 - CHOLERA NOS CHOLERA 10 4 002 002 - TYPHOID/PARATYPHOID FEV TYPHOID AND PARATYPHOID FEVERS 9 5 0020 0020 - TYPHOID FEVER TYPHOID AND PARATYPHOID FEVERS 9

and the output should be something like this 输出应该是这样的

ENCOUNTERID DIAGCODE DIAGSEQNO POA DIAGVERFLAG 0 78841 001 3 P 1 89960 0010 6 2 86479 0011 9 K 3 69135 0019 9 9 0 4 32422 002 2 P 5 69135 0020 4 DH

I would like to add one condition from dataframe dp like this 我想像这样从数据帧dp添加一个条件

for index, row in dp.iterrows(): for i, r in data.iterrows(): if row['Code'] == 10: r['DIAGCODE'] = row['Code']

I assume that the two tables have same row size and are both in desired order you wanted. 我假设这两个表具有相同的行大小,并且都按照您想要的顺序排列。 If it's correct, then you can simply use: 如果正确,那么您可以简单地使用:

df = pd.concat([data, pd], axis=1)

Then extract the columns you wanted: 然后提取所需的列:

df = df.ix[;,['ENCOUNTERID','CODE', 'DIAGSEQNO', 'POA', 'DIAGVERFLAG']].rename(columns={'CODE': 'DIAGCODE'})

If this meets your requirement, please vote. 如果这符合您的要求,请投票。


Sorry, the .ix was deprecated even it can still be used without problem. 抱歉,.ix已弃用,即使它仍然可以毫无问题地使用。 So please use 所以请使用

df = df[['ENCOUNTERID','CODE', 'DIAGSEQNO', 'POA', 'DIAGVERFLAG']].rename(columns={'CODE': 'DIAGCODE'})

BTW, the issue in your code is that you were using two loops which makes the last value of inside loop to be the final value of outside loop. 顺便说一句,您的代码中的问题是您正在使用两个循环,这使得内部循环的最后一个值成为外部循环的最终值。 So here is solution: 所以这是解决方案:

for row, r in zip(pd.iterrows(),data.iterrows()):
    r[1]['DIAGCODE']=row[1]['CODE']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM