简体   繁体   English

Python Pandas:如何从另一个csv文件更新一个csv文件

[英]Python Pandas: how to update a csv file from another csv file

We have two CSV files: a.csv and b.csv . 我们有两个CSV文件: a.csvb.csv

a.csv has tree columns: label , item1 , item2 . a.csv具有树列: labelitem1item2 b.csv has two columns: item1 , item2 . b.csv有两列: item1item2 If item1 and item2 in a.csv also occurr in b.csv , that's a.csv and b.csv have same item1 and item2 , the value of label in a.csv should be 1 instead. ITEM1ITEM2 a.csv也occurr在b.csv ,这是a.csvb.csv具有相同的物品1项目2,标签中的值a.csv应为1来代替。 How to use pandas to deal? 如何用大熊猫来应对?


For example: 例如:

a.csv: a.csv:

label    item1     item2
 0         123       35
 0         342       721
 0         876       243

b.csv: b.csv:

item1     item2
 12        35
 32        721
 876       243

result.csv: result.csv:

label    item1     item2
 0         123       35
 0         342       721
 1         876       243

I tried this, but it doesn't work: 我试过了,但是不起作用:

import pandas as pd

df1 = pd.read_csv("~/train_dataset.csv", names=['label', 'user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time','sales'], parse_dates=True)
df2 = pd.read_csv(~/train_user.csv", names=['user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time', 'sales'], parse_dates=True)
df1.loc[(df1['user_id'] == df2['user_id'])& (df1['item_id'] == df2['item_id']), 'label'] = 1

You could use loc and a boolean condition to mask your df (here representing a.csv) and set the label to 1 if that condition is met: 您可以使用loc和布尔条件屏蔽df(此处表示a.csv),如果满足该条件,则将标签设置为1:

In [18]:

df.loc[(df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 'label'] = 1
df
Out[18]:
   label  item1  item2
0      0    123     35
1      0    342    721
2      1    876    243

If you want to set all row values you could use np.where : 如果要设置所有行值,则可以使用np.where

In [19]:

np.where((df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 1, 0)
Out[19]:
array([0, 0, 1])
In [20]:

df['label'] = np.where((df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 1, 0)
df
Out[20]:
   label  item1  item2
0      0    123     35
1      0    342    721
2      1    876    243

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:如何从另一个csv文件更新一个csv文件 - Python: how to update a csv file from another csv file 如何在没有熊猫的情况下在python中的另一个csv文件中写入一个csv文件 - how to write a csv file in another csv file in python without pandas 如何根据另一个 csv 文件的数据更新 csv 文件? - How to update a csv file based on data from another csv file? Python:如何将值从另一个csv文件写入一个csv文件 - Python: How to write values to a csv file from another csv file 如何在 pandas(或 python csv)中读取此 csv 文件? - How to read this csv file in pandas (or python csv)? 如何将新值附加/更新到现有 csv 文件的行中,从新的 csv 文件作为 python 或其他内容的新列 - How to append/update new values to the rows of a existing csv file from a new csv file as a new column in python using pandas or something else python根据另一个csv文件更新csv文件的列值 - python update a column value of a csv file according to another csv file Python:从csv文件读取Pandas数据帧,将过滤后的输出作为csv生成到另一个文件 - Python: Read Pandas Dataframe from csv File, Make Filtered Output to Another File as csv 如何使用 Python 和 Z251D1BBFE9A3B678CEAZ30DC 将 csv 文件中一个单元格的值复制到另一个 csv 文件? - How can I copy the value of one cell in a csv file to another csv file using Python and Pandas? Python Pandas - 从csv文件创建时间序列 - Python Pandas - Creating a timeseries from a csv file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM