简体   繁体   中英

Combine 2 csv file using python with the specified amount

I want to combine 2 file CSV data, but not all data. eg: a.csv + b.csv, where b.csv have 20 data. But I want to take only 10 data from that, and then take 11-20 data. Or the first 10 and the second 10

Then insert the first 10 data into a.csv, and the second 10 data into a.csv too My Question is how can I take only specific total data?

Here is my code:

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

I expect the result return that I want, but the actual result is combining all data.

Without using Pandas. Read the lines of each file; add ten lines from one file's data to the other; write the result to another file.

with open('a.csv') as f:
    data = f.readlines()
with open('b.csv') as f:
    bdata = f.readlines()

data.extend(bdata[:10])

with open('output.csv', 'w'):
    f.writelines(data)

If the files are HUGE and you don't want to read the entire contents into memory, use some itertools functions.

import itertools
with open('a.csv') as a, open('b.csv') as b, open('output.csv', 'w') as out:
    first_ten = itertools.islice(b, 10)
    for line in itertools.chain(a, first_ten):
        out.write(line)

Assumes both files have the same number of columns.

import pandas as pd
import numpy as np
# Creating two dataframes with data that overlap, so we don't want all of the 'b' data.
# We want to strip off '3,4,5' as they exist in 'a' as well
# ----------Creating the data frames----------
a = [1,2,3,4,5]
b = [3,4,5,6,7,8,9,10]

dfa = pd.DataFrame(a)
dfa.to_csv('one.csv', index=False)

dfb = pd.DataFrame(b)
dfb.to_csv('two.csv', index = False)
# ---------------------------------------------

# --------Reading through the dataframes-------
one = pd.read_csv('one.csv')
two = pd.read_csv('two.csv')
# ---------------------------------------------

# Stripping off the first 3 data of 'two' the list
output = one.append(two[3:])
output.to_csv("output.csv", sep=',', index=False)
# ---------------------------------------------

I hope this answers your question. The important part for you is output = one.append(two[3:]) . There are more sophisticated ways to do the same thing but this is the simplest.

As mentioned in my comment, you can use nrows

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more options

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM