简体   繁体   中英

Comparing two panda dataframes and writing new dataframes if a row value is common between both dfs

First of all I am going to explain the whole problem and if there is a better way to do this without pandas please say. I have just attempted a bunch of ways and I feel like pandas is likely the best way to go.

I have two text files. Each text file looks something like the following:

Sometextinbothfiles    UniqueText    SomeTextThatCouldbeCommon    Unique Text

There are more columns with UniqueText in but this gives a basic idea of the layout. There is also some header info but this is easy to remove by ignoring the first 22 lines in pandas.The column with the SomeTextThatCouldbeCommon is always in the same place and it is this that I want to look at. It is a filename.

Currently I am just pulling in each text file and seperating them in pandas using

Data = open("data.star", "r")
Datapd = pd.read_csv(Data, sep=r"\s+", skiprows=range(0,23), header=None)

So I want to compare the SomeTextThatCouldbeCommon on each line of the text file to the same SomeTextThatCouldbeCommon on EVERY line of the other text file. If there is a match I then want to write out that whole line to a new dataframe/textfile/array. I then want to do the same in reverse. So in the end I have two files that refer to the same files but have the unique data present in each file about that data.

I hope I have explained this ok. Please help I am struggling to figure out how to do this.

Hi mate here you could find simple example for solving your problem, I hope is gonna work for you:

two example data-frames:

df1 = pd.DataFrame({
"Date" : [2013-11-24, 2013-11-24, 2013-11-24, 2013-11-24],
"Fruit" : ['Banana', 'Orange', 'Apple', 'Celery'], 
"Num" : [22.1, 8.6, 7.6, 10.2],
"Color" : ['Yellow', 'Orange', 'Green', 'Green']
})

df2 = pd.DataFrame({
"Date" : [2013-11-25, 2013-11-24, 2013-11-24, 2018-11-24],
"Fruit" : ['Banana', 'Cherry', 'Mango', 'Celery'], 
"Num" : [22.1, 8.6, 7.6, 10.2],
"Color" : ['Yellow', 'Green', 'Yellow', 'Green']
})

mask = (df1 == df2)
df1.where(mask)

where there is a match you have result otherwise you should receive "NaN" values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM