[英]Trying to sort 2 pandas dataframe and then copy from one to another
I have two csv files I want to manipulate and then combine into one file. 我有两个要处理的csv文件,然后合并为一个文件。 I first converted them to pandas.
我首先将它们转换为熊猫。 One pandas dataframe looks like this:
一个熊猫数据框如下所示:
Number Quiz
0 111111145 0
1 111111108 1
2 111111123 1
3 111111114 0
4 111111132 0
the other like this: 另一个像这样:
Last Name First Name Number Quiz
0 Student1 Student1 111111123
1 Student2 Student2 111111114
2 Student3 Student3 111111132
3 Student4 Student4 111111145
4 Student5 Student5 111111108
I want to end up with something like this: 我想结束这样的事情:
Last Name First Name Number Quiz
0 Student1 Student1 111111108 1
1 Student2 Student2 111111114 0
2 Student3 Student3 111111123 1
3 Student4 Student4 111111132 0
4 Student5 Student5 111111145 0
but when I run my code I end up getting: 但是当我运行代码时,我最终得到:
Last Name First Name Number Quiz
0 Student1 Student1 111111108 0
1 Student2 Student2 111111114 1
2 Student3 Student3 111111123 0
3 Student4 Student4 111111132 1
4 Student5 Student5 111111145 0
And I am not sure why. 我不确定为什么。 My code is as follows:
我的代码如下:
import argparse
import sys, re
import numpy as np
import smtplib
from random import randint
import csv
import math
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument('-cname', '--c', help = 'column name to copy')
parser.add_argument('-source', '--s', help = 'source file with the column to copy')
parser.add_argument('-target', '--t', help = 'the target file with the names and UINS')
parser.add_argument('-out', '--f', help = 'output file with column copied')
if len(sys.argv)==1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
sourceFile = pd.read_csv(args.s)
targetFile = pd.read_csv(args.t)
print sourceFile
print targetFile
del targetFile[args.c]
sourceFile.sort_values('UIN', ascending = True, inplace = True)
targetFile.sort_values('UIN', ascending = True, inplace = True)
print sourceFile
print targetFile
targetFile[args.c]= sourceFile[args.c]
targetFile.to_csv(args.f, index = False)
print targetFile
you should use a merge to get your output : 您应该使用合并来获取输出:
merged = df1.merge(df2, on="Number")
Should work, but you might have a problem of duplicated "Quiz" column if it appears in df1. 应该可以,但是如果df1中出现“ Quiz”列重复的问题。
you can use the following to remove this problem (removes the quiz column from your first dataframe before it computes: 您可以使用以下方法消除此问题(在计算前从第一个数据帧中删除测验列:
merged = df1[df1.columns[:-1]].merge(df2, on="Number")
I changed it just a little and got it to work. 我做了一点修改就可以了。 I used
我用了
result = pd.merge(targetFile, sourceFile, on = 'number')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.