[英]Compare two CSV file and update one CSV file based on compared result using Python
I'm new to Python and written this code to compare two CSV file. 我是Python的新手,并编写了此代码以比较两个CSV文件。 The idea is: my 1st CSV file has only one column: 这个想法是:我的第一个CSV文件只有一列:
1st CSV 第一个CSV
ColA
----
A
B
C
My 2nd CSV file is like: 我的第二个CSV文件是这样的:
Country ColA ColB
-------------------------
US A
Ind B
C AU K-
What I wanted is, if any record of ColA of 1st CSV file matches with ColA of 2nd CSV file then, I want to parse ColB's "AU K-" value to get only "AU" and update in the country of the 2nd CSV file. 我想要的是,如果第一个CSV文件的ColA记录与第二个CSV文件的ColA匹配,那么我想解析ColB的“ AU K-”值以仅获取“ AU”并在第二个CSV文件的国家/地区更新。 and my output should look like this: 和我的输出应如下所示:
So, my 2nd CSV file/Output file will look like: 因此,我的第二个CSV文件/输出文件如下所示:
Country ColA ColB
---------------------
US A
Ind B
AU C
The following code is written to find the match: However, I'm not getting the output in the console while testing. 编写以下代码来找到匹配项:但是,在测试时,我没有在控制台中获得输出。 The console simply appears and disappears. 控制台只是出现和消失。
How to retain the console so that I can read the matches? 如何保留控制台以阅读比赛内容? Also, how to update the 2nd CSV if the values matched? 另外,如果值匹配,如何更新第二个CSV?
Below is my code sample how I' doing this: 下面是我的代码示例如何执行此操作:
import pandas
with open('D:\Project\SourceFile.csv') as f:
r = pandas.read_csv(f)
with open('D:\Project\Searchfile.csv','r') as w:
x = pandas.read_csv(w)
col = w['ColA']
for line in w:
for col in w:
for row in r:
if row in col:
print(line)
Note: I'm using IronPython on VS 2015 with Win7 64Bit. 注意:我在带有Win7 64Bit的VS 2015上使用IronPython。 Using IronPython as I would like to integrate few things with .Net code.However, I'm open for any normal/default Python tool. 我想使用IronPython来与.Net代码集成一些东西。但是,我可以使用任何普通/默认的Python工具。
It looks like IronPython or Python on Windows is somehow not reading CSV file: I referred this ( http://pythonhow.com/data-analysis-with-python-pandas/ ) just to test, however, nothing is showing up..Not sure why? 看来Windows上的IronPython或Python无法读取CSV文件:我引用此( http://pythonhow.com/data-analysis-with-python-pandas/ )只是为了进行测试,但是没有任何显示。不知道为什么吗? Checked environment variables and Python folder, DLLs and Libs are added. 已检查的环境变量和Python文件夹,DLL和Lib已添加。 What am I doing wrong? 我究竟做错了什么? Hope I explained it correctly. 希望我能正确解释。
Please suggest. 请提出建议。 Thanks. 谢谢。
This is what it can look like with Pandas and Numpy (Pandas 0.15 Python 3) 这就是Pandas和Numpy(Pandas 0.15 Python 3)的样子
import pandas as pd
import numpy as np
w["Country"] = np.where((w["ColA"].isin(r['ColA'].tolist())) & (~ pd.notnull(w["Country"])) ,w["ColB"], w["Country"])
# Np.where(condition, if true,if false)
It will affect the value of the ColB in w in Country only if the value of ColA in w is also in ColA of r and that there isn't a country entered yet. 仅当w中的ColA值也在r的ColA中并且尚未输入国家/地区时,它才会影响w中的ColB值。
Hope it helps 希望能帮助到你
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.