[英]Python Pandas: how to update a csv file from another csv file
We have two CSV files: a.csv
and b.csv
. 我们有两个CSV文件: a.csv
和b.csv
。
a.csv
has tree columns: label , item1 , item2 . a.csv
具有树列: label , item1 , item2 。 b.csv
has two columns: item1 , item2 . b.csv
有两列: item1 , item2 。 If item1 and item2 in a.csv
also occurr in b.csv
, that's a.csv
and b.csv
have same item1 and item2 , the value of label in a.csv
should be 1 instead. 若ITEM1和ITEM2 a.csv
也occurr在b.csv
,这是a.csv
和b.csv
具有相同的物品1和项目2,标签中的值a.csv
应为1来代替。 How to use pandas to deal? 如何用大熊猫来应对?
For example: 例如:
a.csv: a.csv:
label item1 item2
0 123 35
0 342 721
0 876 243
b.csv: b.csv:
item1 item2
12 35
32 721
876 243
result.csv: result.csv:
label item1 item2
0 123 35
0 342 721
1 876 243
I tried this, but it doesn't work: 我试过了,但是不起作用:
import pandas as pd
df1 = pd.read_csv("~/train_dataset.csv", names=['label', 'user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time','sales'], parse_dates=True)
df2 = pd.read_csv(~/train_user.csv", names=['user_id', 'item_id', 'behavior_type', 'user_geohash', 'item_category', 'time', 'sales'], parse_dates=True)
df1.loc[(df1['user_id'] == df2['user_id'])& (df1['item_id'] == df2['item_id']), 'label'] = 1
You could use loc
and a boolean condition to mask your df (here representing a.csv) and set the label to 1 if that condition is met: 您可以使用loc
和布尔条件屏蔽df(此处表示a.csv),如果满足该条件,则将标签设置为1:
In [18]:
df.loc[(df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 'label'] = 1
df
Out[18]:
label item1 item2
0 0 123 35
1 0 342 721
2 1 876 243
If you want to set all row values you could use np.where
: 如果要设置所有行值,则可以使用np.where
:
In [19]:
np.where((df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 1, 0)
Out[19]:
array([0, 0, 1])
In [20]:
df['label'] = np.where((df['item1'] == df1['item1'])& (df['item2'] == df1['item2']), 1, 0)
df
Out[20]:
label item1 item2
0 0 123 35
1 0 342 721
2 1 876 243
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.