Recently, in one of my projects I have had this small task where I need to split a list into multiple lists and save it as a file in Amazon S3. In this article, I will share the solution along with the code.
So, the straight solution is the following piece of code.
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
Let us assume, I have a file named data.json with a format similar to the following.
{
"data": ["123","12312312","123sdfsdf","Aadsfasdf","asfdasdfa","asdfaafdavasdf"]
}
Read the file and pass the list to the divide_chunks function along with the number which you like to split.
with open('data.json','r') as file:
data = json.loads(file.read())
n = 2000
data_list = list(divide_chunks(data['data'],n))
Now all you need to do is, add a for loop using data_list and save it as a file or upload it to S3.
fileCount = 1
for item in data_list:
temp = {
"data" : item
}
with open(f'data{fileCount}.json','w+') as temp_file:
temp_file.write(json.dumps(temp))
fileCount += 1
Final code
You can find the following code on my Github repository.
import json
# un comment if you want to upload to s3
# import boto3
# s3Client = boto3.resource('s3')
# bucketName = 'BUCKET_NAME'
# s3Folder = 'data_{}.json'
with open('data.json','r') as file:
data = json.loads(file.read())
def divide_chunks(l, n):
# looping till length l
for i in range(0, len(l), n):
yield l[i:i + n]
n = 2000
data_list = list(divide_chunks(data['data'],n))
fileCount = 1
for item in data_list:
temp = {
"data" : item
}
# use below code if you want to upload to S3
# s3Client.Object(bucketName, s3Folder.format(fileCount)).put(Body = json.dumps(temp,indent=4))
with open(f'data{fileCount}.json','w+') as temp_file:
temp_file.write(json.dumps(temp))
fileCount += 1
Happy programming!!