简体   繁体   English

根据另一个 dataframe 的值向 dataframe 添加新列

[英]Adding a new column to a dataframe based on the values of another dataframe

I do have two csv files, I am using pandas to read the data.我确实有两个csv文件,我正在使用 pandas 读取数据。

The train.csv contains values, with headers id, sentiment train.csv包含值,标头id, sentiment

87,Positive
10,Positive
7,Neutral

The text.csv contains values, with headers id, text text.csv包含值,标题id, text

7,hello, I think the price if high...
87, you can call me tomorow...
....

I would like to insert the text from text.csv into train.csv so the result would be:我想将train.csv text.csv结果是:

87,Positive, you can call me tomorow...

Can any one help with pandas?任何人都可以帮助 pandas 吗?

import pandas as pd

train= pd.read_csv("train.csv")
text= pd.read_csv("text.csv")

# this does not work
combined= pd.merge(train, text, on=['id'])

Note Some Ids may not be in the files, so I need to set null if the id does not exists Note有些id可能不在文件中,所以如果id不存在我需要设置null

set the indices on the two dataframes, then add the columns:在两个数据帧上设置索引,然后添加列:

train.set_index('id').sentiment + text.set_index('id').text

One of the easy way can be一种简单的方法可以是

pd.merge(train, test, on='id', how='outer')

As per pandas docs , if you use how as outer , it will take all keys根据 pandas 文档,如果您使用how作为outer ,它将占用所有密钥

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据条件从另一个数据帧的值向数据帧添加新列 - Adding a new column to a dataframe from the values of another dataframe based on a condition 根据另一个数据框中的值将列添加到数据框中 - Adding column to dataframe based on values in another dataframe 基于另一个 dataframe 列在 dataframe 中添加新列 - adding a new column in a dataframe based on another dataframe column 根据列表中的值向 DataFrame 添加新列 - Adding new column to a DataFrame based on values in a list Pyspark:使用 udf 根据另一个 dataframe 中的值向 dataframe 添加新列 - Pyspark: adding a new column to dataframe based on the values in another dataframe using an udf 根据另一个数据帧的列值的条件将数据添加到数据帧中的列 - Adding data to columns in a dataframe based on condition on column values of another dataframe 根据另一个数据框列的值创建新数据框 - Creating a new dataframe based on values of another dataframe's column 基于另一列在 pandas dataframe 中添加新列 - Adding new column in pandas dataframe based on another column 将时间列添加到基于另一个 DataFrame 的 DataFrame - Adding a time column to a DataFrame based on another DataFrame 将数据帧的切片添加到新列中的另一个数据帧 - Adding slices of a dataframe to another dataframe in a new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM