简体   繁体   English

使用指定数量的python组合2个csv文件

[英]Combine 2 csv file using python with the specified amount

I want to combine 2 file CSV data, but not all data. 我想要合并2个文件CSV数据,但不是所有数据。 eg: a.csv + b.csv, where b.csv have 20 data. 例如:a.csv + b.csv,其中b.csv有20个数据。 But I want to take only 10 data from that, and then take 11-20 data. 但我想从中获取10个数据,然后获取11-20个数据。 Or the first 10 and the second 10 或者前10和后10

Then insert the first 10 data into a.csv, and the second 10 data into a.csv too My Question is how can I take only specific total data? 然后将前10个数据插入a.csv,将第二个10个数据插入a.csv我的问题是如何才能只获取特定的总数据?

Here is my code: 这是我的代码:

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

I expect the result return that I want, but the actual result is combining all data. 我希望结果返回我想要的,但实际结果是组合所有数据。

Without using Pandas. 不使用熊猫。 Read the lines of each file; 阅读每个文件的行; add ten lines from one file's data to the other; 从一个文件的数据添加十行到另一个; write the result to another file. 将结果写入另一个文件。

with open('a.csv') as f:
    data = f.readlines()
with open('b.csv') as f:
    bdata = f.readlines()

data.extend(bdata[:10])

with open('output.csv', 'w'):
    f.writelines(data)

If the files are HUGE and you don't want to read the entire contents into memory, use some itertools functions. 如果文件是巨大的并且您不想将整个内容读入内存,请使用一些itertools函数。

import itertools
with open('a.csv') as a, open('b.csv') as b, open('output.csv', 'w') as out:
    first_ten = itertools.islice(b, 10)
    for line in itertools.chain(a, first_ten):
        out.write(line)

Assumes both files have the same number of columns. 假设两个文件具有相同的列数。

import pandas as pd
import numpy as np
# Creating two dataframes with data that overlap, so we don't want all of the 'b' data.
# We want to strip off '3,4,5' as they exist in 'a' as well
# ----------Creating the data frames----------
a = [1,2,3,4,5]
b = [3,4,5,6,7,8,9,10]

dfa = pd.DataFrame(a)
dfa.to_csv('one.csv', index=False)

dfb = pd.DataFrame(b)
dfb.to_csv('two.csv', index = False)
# ---------------------------------------------

# --------Reading through the dataframes-------
one = pd.read_csv('one.csv')
two = pd.read_csv('two.csv')
# ---------------------------------------------

# Stripping off the first 3 data of 'two' the list
output = one.append(two[3:])
output.to_csv("output.csv", sep=',', index=False)
# ---------------------------------------------

I hope this answers your question. 我希望这回答了你的问题。 The important part for you is output = one.append(two[3:]) . 对你来说重要的部分是output = one.append(two[3:]) There are more sophisticated ways to do the same thing but this is the simplest. 有更复杂的方法来做同样的事情,但这是最简单的。

As mentioned in my comment, you can use nrows 正如我的评论中提到的,你可以使用nrows

import pandas as pd

df1 = pd.read_csv('testNegatif.csv')
df2 = pd.read_csv('trainNegatif.csv', nrows=10)

output=df1.append(df2)
output.to_csv("output.csv", sep=',')

See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html for more options 请参阅: https//pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html了解更多选项

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM