简体   繁体   English

如何使用 python 基于键值合并数据帧或 CSV 文件?

[英]How can I use python to merge dataframes or CSV files base on a key value?

I have two csv files, first one (master file) contain Keys and its values and the second one (daily file) contain key and some other columns.我有两个 csv 文件,第一个(主文件)包含键及其值,第二个(日常文件)包含键和其他一些列。

Example file (master file)示例文件(主文件)

  Key  value
   A     1
   B     2
   c     3

Example file (daily file)示例文件(每日文件)

 Name  Key  date
 Red   A    dd/mm/yy
 Blue  B    dd/mm/yy
 Pink  C    dd/mm/yy

The outcome file I need looks like this:我需要的结果文件如下所示:

 Name   Key   value   date
 Pink    C      3     dd/mm/yy
 Blue    B      2     dd/mm/yy
 Red     A      1     dd/mm/yy

I've tried using dataframe and creating dicts from external file or dataframe but not idea how can I do the lookup base on key and obtain its value.我尝试使用 dataframe 并从外部文件或 dataframe 创建字典,但不知道如何根据键进行查找并获得其值。

Use this code, may be it will help you and you will get your desire output使用此代码,可能会对您有所帮助,您会得到您的愿望 output

# import Library
import pandas as pd 

# Create Dataframe as like as imported CSV file for your (master file) and (daily file)
masterfile = {'key':['A','B','C'], 'value':[1,2,3]}
dailyfile = {'Name':['Red','Blue','Pink'],'key':['A','B','C'], 'date':['dd/mm/yy','dd/mm/yy','dd/mm/yy']}

masterfil = pd.DataFrame(data=masterfile, index=None)
dailyfile = pd.DataFrame(data=dailyfile, index=None)

# Change the order of dataframe(descending order)
df_masterfil = masterfil.sort_values(by='key', ascending=False)
df_dailyfile = dailyfile.sort_values(by='key', ascending=False)

# merge the both dataframe or csv file
df=df_dailyfile.merge(df_masterfil)
df=df[['key', 'value', 'date']]

dailyfile['Name']

# As like your dataset
result= pd.concat([dailyfile['Name'], df1], axis=1)
result

在此处输入图像描述

i think the most basic answer to your question is given by w3schools .我认为w3schools对您的问题给出了最基本的答案。

How you first put the content of your files into dicts is a different story that starts by reading the file .您如何首先将文件的内容放入 dicts 是一个不同的故事,从读取文件开始。 I think i would choose the readline() option and split() ting the gotten strings into key and value pairs for the dict()我想我会选择readline()选项并split()将获取的字符串转换为dict()的键和值对

If your csv files as you've described, you can do this very easily with Pandas to merge records from both files with matching Key values:如果您的 csv 文件如您所述,您可以使用 Pandas 轻松地合并两个文件中具有匹配Key的记录:

import pandas as pd
df1 = pd.read_csv('master.csv')
df2 = pd.read_csv('daily.csv')
df3 = df2.merge(df1, left_on='Key', right_on='Key')

This gives you a merged dataframe for all data with matching Key s:这为您提供了一个合并的 dataframe 用于所有具有匹配Key的数据:

   Name Key      date  value
0   Red   A  dd/mm/yy      1
1  Blue   B  dd/mm/yy      2
2  Pink   C  dd/mm/yy      3

If you want the columns in the order in your question you can just add如果您希望问题中的列按顺序排列,您只需添加

df3 = df3[['Name', 'Key', 'value', 'date']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM